Pandas is the essential library for data manipulation and analysis in Python.
Creating DataFrames
import pandas as pd
# From dictionary
df = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35],
'city': ['NYC', 'LA', 'Chicago']
})
# From CSV
df = pd.read_csv('data.csv')
Basic Operations
# View data
print(df.head())
print(df.info())
print(df.describe())
# Select columns
ages = df['age']
subset = df[['name', 'age']]
# Filter rows
adults = df[df['age'] >= 18]
filtered = df[(df['age'] > 25) & (df['city'] == 'NYC')]
Data Aggregation
# Group by
grouped = df.groupby('city')['age'].mean()
# Multiple aggregations
agg_df = df.groupby('city').agg({
'age': ['mean', 'min', 'max'],
'name': 'count'
})
Data Cleaning
# Handle missing values
df.dropna() # Remove rows with NA
df.fillna(0) # Fill NA with value
# Remove duplicates
df.drop_duplicates()
# Rename columns
df.rename(columns={'old_name': 'new_name'})
Pandas makes data analysis efficient and intuitive!