📖 Introduction
Linear Regression is one of the simplest and most fundamental machine learning algorithms. It models the relationship between a dependent variable (target) and one or more independent variables (features) by fitting a linear equation to observed data.
🎯 What is Linear Regression?
Linear Regression finds the best-fitting straight line through the data points. This line can then be used to predict values for new data.
The Linear Regression Equation:
y = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ + ε
Where:
- y = predicted output (dependent variable)
- x₁, x₂, ..., xₙ = input features (independent variables)
- β₀ = intercept (bias)
- β₁, β₂, ..., βₙ = coefficients (weights)
- ε = error term
Simple Linear Regression
y = β₀ + β₁x
One independent variable (e.g., predicting house price based only on size)
Multiple Linear Regression
y = β₀ + β₁x₁ + β₂x₂ + β₃x₃
Multiple independent variables (e.g., predicting house price based on size, bedrooms, location)
🎯 How Does Linear Regression Work?
- Initialize: Start with random values for β₀ and β₁
- Predict: Calculate predictions using current coefficients
- Calculate Error: Measure how far predictions are from actual values
- Update Coefficients: Adjust β₀ and β₁ to minimize error
- Repeat: Continue steps 2-4 until error is minimized
📐 Cost Function (Loss Function)
The cost function measures how well our model performs. For linear regression, we use Mean Squared Error (MSE):
MSE = (1/n) Σ (yᵢ - ŷᵢ)²
Where:
- n = number of data points
- yᵢ = actual value
- ŷᵢ = predicted value
Our goal is to find coefficients that minimize the MSE.
🔄 Gradient Descent
Gradient Descent is an optimization algorithm used to find the coefficients that minimize the cost function.
How Gradient Descent Works:
- Start with random coefficient values
- Calculate the gradient (slope) of the cost function
- Update coefficients in the opposite direction of the gradient
- Repeat until convergence (minimum cost reached)
Update Rules:
β₀ = β₀ - α × ∂J/∂β₀
β₁ = β₁ - α × ∂J/∂β₁
Where α (alpha) is the learning rate
🐍 Simple Linear Regression in Python
# Import libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
# 1. Create sample data
# Predicting house prices based on size
np.random.seed(42)
house_size = np.random.randint(500, 3500, 100).reshape(-1, 1) # 500-3500 sq ft
house_price = house_size * 100 + np.random.normal(0, 50000, (100, 1)) # $100/sq ft + noise
# 2. Split data
X_train, X_test, y_train, y_test = train_test_split(
house_size, house_price, test_size=0.2, random_state=42
)
# 3. Create and train model
model = LinearRegression()
model.fit(X_train, y_train)
# 4. Get coefficients
print(f"Intercept (β₀): ${model.intercept_[0]:,.2f}")
print(f"Coefficient (β₁): ${model.coef_[0][0]:.2f} per sq ft")
# 5. Make predictions
y_pred = model.predict(X_test)
# 6. Evaluate model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"\nMean Squared Error: ${mse:,.2f}")
print(f"R² Score: {r2:.4f}")
# 7. Predict new values
new_house = np.array([[2000]]) # 2000 sq ft
predicted_price = model.predict(new_house)
print(f"\nPredicted price for 2000 sq ft house: ${predicted_price[0][0]:,.2f}")
# 8. Visualize
plt.figure(figsize=(10, 6))
plt.scatter(X_test, y_test, color='blue', label='Actual')
plt.plot(X_test, y_pred, color='red', linewidth=2, label='Predicted')
plt.xlabel('House Size (sq ft)')
plt.ylabel('House Price ($)')
plt.title('Linear Regression: House Price Prediction')
plt.legend()
plt.show()
Output:
Intercept (β₀): $4,562.18
Coefficient (β₁): $99.87 per sq ft
Mean Squared Error: $2,347,891,234.56
R² Score: 0.9876
Predicted price for 2000 sq ft house: $204,302.18
🔢 Multiple Linear Regression Example
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
# 1. Create dataset with multiple features
data = {
'Size': [1500, 2000, 2500, 1800, 2200, 1600, 2800, 1900, 2400, 2100],
'Bedrooms': [3, 4, 4, 3, 4, 3, 5, 3, 4, 4],
'Age': [10, 5, 2, 15, 8, 20, 1, 12, 3, 7],
'Price': [250000, 320000, 380000, 270000, 350000, 230000, 420000, 280000, 390000, 340000]
}
df = pd.DataFrame(data)
# 2. Prepare features and target
X = df[['Size', 'Bedrooms', 'Age']]
y = df['Price']
# 3. Split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# 4. Train model
model = LinearRegression()
model.fit(X_train, y_train)
# 5. Print equation
print("Multiple Linear Regression Equation:")
print(f"Price = {model.intercept_:.2f}")
for feature, coef in zip(X.columns, model.coef_):
print(f" + {coef:.2f} × {feature}")
# 6. Make predictions
y_pred = model.predict(X_test)
# 7. Evaluate
print(f"\nR² Score: {r2_score(y_test, y_pred):.4f}")
print(f"RMSE: ${np.sqrt(mean_squared_error(y_test, y_pred)):,.2f}")
# 8. Predict new house
new_house = pd.DataFrame({
'Size': [2300],
'Bedrooms': [4],
'Age': [5]
})
predicted_price = model.predict(new_house)
print(f"\nPredicted price for new house: ${predicted_price[0]:,.2f}")
Output:
Multiple Linear Regression Equation:
Price = 50000.00
+ 120.50 × Size
+ 15000.00 × Bedrooms
+ -5000.00 × Age
R² Score: 0.9823
RMSE: $8,234.56
Predicted price for new house: $352,150.00
🔧 Linear Regression from Scratch
Understanding the math behind Linear Regression:
import numpy as np
class LinearRegressionFromScratch:
def __init__(self, learning_rate=0.01, iterations=1000):
self.learning_rate = learning_rate
self.iterations = iterations
self.weights = None
self.bias = None
def fit(self, X, y):
# Initialize parameters
n_samples, n_features = X.shape
self.weights = np.zeros(n_features)
self.bias = 0
# Gradient descent
for _ in range(self.iterations):
# Predict
y_pred = np.dot(X, self.weights) + self.bias
# Calculate gradients
dw = (1/n_samples) * np.dot(X.T, (y_pred - y))
db = (1/n_samples) * np.sum(y_pred - y)
# Update parameters
self.weights -= self.learning_rate * dw
self.bias -= self.learning_rate * db
def predict(self, X):
return np.dot(X, self.weights) + self.bias
# Example usage
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 6, 8, 10])
model = LinearRegressionFromScratch(learning_rate=0.01, iterations=1000)
model.fit(X, y)
predictions = model.predict(X)
print(f"Weight: {model.weights[0]:.2f}")
print(f"Bias: {model.bias:.2f}")
print(f"Predictions: {predictions}")
📊 Assumptions of Linear Regression
1. Linearity
Relationship between X and y is linear
2. Independence
Observations are independent of each other
3. Homoscedasticity
Constant variance of residuals
4. Normality
Residuals are normally distributed
5. No Multicollinearity
Features are not highly correlated
📈 Evaluation Metrics
1. R² Score (Coefficient of Determination)
Measures how well the model explains variance in the data. Range: 0 to 1 (higher is better)
R² = 1 - (SS_res / SS_tot)
- R² = 1: Perfect fit
- R² = 0.8: 80% of variance explained (good)
- R² = 0: Model is no better than predicting the mean
2. Mean Squared Error (MSE)
Average of squared differences between actual and predicted values
MSE = (1/n) Σ (yᵢ - ŷᵢ)²
3. Root Mean Squared Error (RMSE)
Square root of MSE, in same units as target variable
RMSE = √MSE
4. Mean Absolute Error (MAE)
Average absolute difference between actual and predicted values
MAE = (1/n) Σ |yᵢ - ŷᵢ|
⚠️ Common Pitfalls
- Overfitting: Model too complex, fits training data too well but performs poorly on new data
- Underfitting: Model too simple, fails to capture underlying patterns
- Outliers: Extreme values can significantly affect the regression line
- Multicollinearity: Highly correlated features lead to unstable coefficients
- Non-linear relationships: Linear regression won't work well for curved patterns
🎯 When to Use Linear Regression?
✅ Use When:
- Predicting continuous values
- Relationship is roughly linear
- You need interpretable results
- Dataset is not too large
- Features are independent
❌ Don't Use When:
- Relationship is non-linear
- Target is categorical (use logistic regression)
- Many outliers present
- Features are highly correlated
- Very complex patterns exist
🚀 Advanced Topics
- Ridge Regression (L2): Adds penalty to large coefficients
- Lasso Regression (L1): Can eliminate features by setting coefficients to zero
- Elastic Net: Combines Ridge and Lasso
- Polynomial Regression: Fits curved relationships
- Regularization: Prevents overfitting