Lesson 12: Introduction to Data Science with Python

Introduction to Data Science with Python

Data Science is one of the most exciting and in-demand fields today. Python, with its powerful libraries, is a go-to language for data analysis, visualization, and machine learning. In this lesson, we will explore the fundamentals of Data Science using Pandas, NumPy, and Matplotlib.


1️⃣ Understanding Data Science

What is Data Science?

Data Science is a multidisciplinary field that extracts insights from structured and unstructured data using statistics, machine learning, and data visualization.

Why is Data Science Important?

✅ Helps in decision-making by analyzing trends and patterns.
✅ Used in multiple industries like healthcare, finance, e-commerce, and social media.
✅ Powers AI-based applications, recommendation systems, fraud detection, and automation.


2️⃣ Working with Pandas

Pandas is a Python library for data manipulation and analysis. It provides DataFrames, a powerful data structure for handling tabular data (similar to an Excel spreadsheet).

Installing Pandas

To install Pandas, run:

Bash
pip install pandas

Loading a Dataset

Python
import pandas as pd  

# Load a CSV file
df = pd.read_csv("data.csv")

# Display the first 5 rows
print(df.head())

Basic Data Manipulation

Python
# Check for missing values
print(df.isnull().sum())

# Fill missing values with the mean
df.fillna(df.mean(), inplace=True)

# Select specific columns
print(df[['column1', 'column2']])

Analyzing Data

Python
# Get summary statistics
print(df.describe())

# Count unique values in a column
print(df['column_name'].value_counts())

3️⃣ Introduction to NumPy

NumPy (Numerical Python) is a fundamental library for numerical computations, especially for handling multi-dimensional arrays.

Installing NumPy

Python
pip install numpy

Creating Arrays in NumPy

Python
import numpy as np  

# Create a 1D array
arr = np.array([1, 2, 3, 4, 5])

# Create a 2D array
matrix = np.array([[1, 2, 3], [4, 5, 6]])

print(arr)
print(matrix)

Basic NumPy Operations

Python
# Get shape of the array
print(arr.shape)

# Perform mathematical operations
print(np.mean(arr))  # Mean
print(np.sum(arr))   # Sum
print(np.max(arr))   # Max value

4️⃣ Simple Data Analysis Project

We will use Pandas and Matplotlib to analyze a dataset and visualize the results.

Installing Matplotlib

Python
pip install matplotlib

Loading and Visualizing Data

Python
import matplotlib.pyplot as plt  

# Load dataset
df = pd.read_csv("data.csv")

# Plot data distribution
plt.hist(df['column_name'], bins=10, color='blue', edgecolor='black')
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.title('Data Distribution')
plt.show()

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top