Pandas Library
Pandas is the go-to library for data analysis in Python. It provides DataFrame and Series structures for working with structured data.
💻 DataFrames & Series
# pip install pandas
import pandas as pd
# Create DataFrame from dictionary
data = {
'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35],
'city': ['NYC', 'LA', 'Chicago']
}
df = pd.DataFrame(data)
print(df)
# Read from CSV
df = pd.read_csv('data.csv')
# Read from Excel
df = pd.read_excel('data.xlsx')
# Series (single column)
ages = pd.Series([25, 30, 35], name='age')
print(ages)🔧 Data Operations
# View data
print(df.head()) # First 5 rows
print(df.tail()) # Last 5 rows
print(df.info()) # Data types and info
print(df.describe()) # Statistical summary
# Select columns
ages = df['age']
subset = df[['name', 'age']]
# Filter rows
adults = df[df['age'] > 25]
ny_adults = df[(df['age'] > 25) & (df['city'] == 'NYC')]
# Add column
df['senior'] = df['age'] > 60
# Sort
sorted_df = df.sort_values('age', ascending=False)
# Group and aggregate
grouped = df.groupby('city')['age'].mean()
print(grouped)📊 Data Cleaning
# Handle missing values
df.isnull().sum() # Count nulls
df.dropna() # Drop rows with nulls
df.fillna(0) # Fill nulls with 0
df['age'].fillna(df['age'].mean()) # Fill with mean
# Remove duplicates
df.drop_duplicates()
# Rename columns
df.rename(columns={'age': 'years'})
# Apply functions
df['age'] = df['age'].apply(lambda x: x + 1)
# Merge DataFrames
df1 = pd.DataFrame({'id': [1, 2], 'name': ['Alice', 'Bob']})
df2 = pd.DataFrame({'id': [1, 2], 'age': [25, 30]})
merged = pd.merge(df1, df2, on='id')
# Save to CSV
df.to_csv('output.csv', index=False)🎯 Key Takeaways
- DataFrame: 2D labeled data structure
- Series: 1D labeled array
- read_csv(): Load CSV files
- head()/tail(): View data preview
- Filter: Boolean indexing with []
- groupby(): Aggregate by groups
- merge(): Join DataFrames