![python (2) Introduction to Data Science with Python](https://www.hqledutech.com/wp-content/uploads/2025/02/python-2-1024x576.png)
Data Science is one of the most exciting and in-demand fields today. Python, with its powerful libraries, is a go-to language for data analysis, visualization, and machine learning. In this lesson, we will explore the fundamentals of Data Science using Pandas, NumPy, and Matplotlib.
1️⃣ Understanding Data Science
What is Data Science?
Data Science is a multidisciplinary field that extracts insights from structured and unstructured data using statistics, machine learning, and data visualization.
Why is Data Science Important?
✅ Helps in decision-making by analyzing trends and patterns.
✅ Used in multiple industries like healthcare, finance, e-commerce, and social media.
✅ Powers AI-based applications, recommendation systems, fraud detection, and automation.
2️⃣ Working with Pandas
Pandas is a Python library for data manipulation and analysis. It provides DataFrames, a powerful data structure for handling tabular data (similar to an Excel spreadsheet).
Installing Pandas
To install Pandas, run:
pip install pandas
Loading a Dataset
import pandas as pd
# Load a CSV file
df = pd.read_csv("data.csv")
# Display the first 5 rows
print(df.head())
Basic Data Manipulation
# Check for missing values
print(df.isnull().sum())
# Fill missing values with the mean
df.fillna(df.mean(), inplace=True)
# Select specific columns
print(df[['column1', 'column2']])
Analyzing Data
# Get summary statistics
print(df.describe())
# Count unique values in a column
print(df['column_name'].value_counts())
3️⃣ Introduction to NumPy
NumPy (Numerical Python) is a fundamental library for numerical computations, especially for handling multi-dimensional arrays.
Installing NumPy
pip install numpy
Creating Arrays in NumPy
import numpy as np
# Create a 1D array
arr = np.array([1, 2, 3, 4, 5])
# Create a 2D array
matrix = np.array([[1, 2, 3], [4, 5, 6]])
print(arr)
print(matrix)
Basic NumPy Operations
# Get shape of the array
print(arr.shape)
# Perform mathematical operations
print(np.mean(arr)) # Mean
print(np.sum(arr)) # Sum
print(np.max(arr)) # Max value
4️⃣ Simple Data Analysis Project
We will use Pandas and Matplotlib to analyze a dataset and visualize the results.
Installing Matplotlib
pip install matplotlib
Loading and Visualizing Data
import matplotlib.pyplot as plt
# Load dataset
df = pd.read_csv("data.csv")
# Plot data distribution
plt.hist(df['column_name'], bins=10, color='blue', edgecolor='black')
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.title('Data Distribution')
plt.show()