Linear Regression - Complete Guide

📖 Introduction

Linear Regression is one of the simplest and most fundamental machine learning algorithms. It models the relationship between a dependent variable (target) and one or more independent variables (features) by fitting a linear equation to observed data.

🎯 What is Linear Regression?

Linear Regression finds the best-fitting straight line through the data points. This line can then be used to predict values for new data.

The Linear Regression Equation:

y = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ + ε

Where:

y = predicted output (dependent variable)
x₁, x₂, ..., xₙ = input features (independent variables)
β₀ = intercept (bias)
β₁, β₂, ..., βₙ = coefficients (weights)
ε = error term

Simple Linear Regression

y = β₀ + β₁x

One independent variable (e.g., predicting house price based only on size)

Multiple Linear Regression

y = β₀ + β₁x₁ + β₂x₂ + β₃x₃

Multiple independent variables (e.g., predicting house price based on size, bedrooms, location)

🎯 How Does Linear Regression Work?

Initialize: Start with random values for β₀ and β₁
Predict: Calculate predictions using current coefficients
Calculate Error: Measure how far predictions are from actual values
Update Coefficients: Adjust β₀ and β₁ to minimize error
Repeat: Continue steps 2-4 until error is minimized

📐 Cost Function (Loss Function)

The cost function measures how well our model performs. For linear regression, we use Mean Squared Error (MSE):

MSE = (1/n) Σ (yᵢ - ŷᵢ)²

Where:

n = number of data points
yᵢ = actual value
ŷᵢ = predicted value

Our goal is to find coefficients that minimize the MSE.

🔄 Gradient Descent

Gradient Descent is an optimization algorithm used to find the coefficients that minimize the cost function.

How Gradient Descent Works:

Start with random coefficient values
Calculate the gradient (slope) of the cost function
Update coefficients in the opposite direction of the gradient
Repeat until convergence (minimum cost reached)

Update Rules:

β₀ = β₀ - α × ∂J/∂β₀

β₁ = β₁ - α × ∂J/∂β₁

Where α (alpha) is the learning rate

🐍 Simple Linear Regression in Python

# Import libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# 1. Create sample data
# Predicting house prices based on size
np.random.seed(42)
house_size = np.random.randint(500, 3500, 100).reshape(-1, 1)  # 500-3500 sq ft
house_price = house_size * 100 + np.random.normal(0, 50000, (100, 1))  # $100/sq ft + noise

# 2. Split data
X_train, X_test, y_train, y_test = train_test_split(
    house_size, house_price, test_size=0.2, random_state=42
)

# 3. Create and train model
model = LinearRegression()
model.fit(X_train, y_train)

# 4. Get coefficients
print(f"Intercept (β₀): ${model.intercept_[0]:,.2f}")
print(f"Coefficient (β₁): ${model.coef_[0][0]:.2f} per sq ft")

# 5. Make predictions
y_pred = model.predict(X_test)

# 6. Evaluate model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"\nMean Squared Error: ${mse:,.2f}")
print(f"R² Score: {r2:.4f}")

# 7. Predict new values
new_house = np.array([[2000]])  # 2000 sq ft
predicted_price = model.predict(new_house)
print(f"\nPredicted price for 2000 sq ft house: ${predicted_price[0][0]:,.2f}")

# 8. Visualize
plt.figure(figsize=(10, 6))
plt.scatter(X_test, y_test, color='blue', label='Actual')
plt.plot(X_test, y_pred, color='red', linewidth=2, label='Predicted')
plt.xlabel('House Size (sq ft)')
plt.ylabel('House Price ($)')
plt.title('Linear Regression: House Price Prediction')
plt.legend()
plt.show()

Output:

Intercept (β₀): $4,562.18
Coefficient (β₁): $99.87 per sq ft

Mean Squared Error: $2,347,891,234.56
R² Score: 0.9876

Predicted price for 2000 sq ft house: $204,302.18

🔢 Multiple Linear Regression Example

import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# 1. Create dataset with multiple features
data = {
    'Size': [1500, 2000, 2500, 1800, 2200, 1600, 2800, 1900, 2400, 2100],
    'Bedrooms': [3, 4, 4, 3, 4, 3, 5, 3, 4, 4],
    'Age': [10, 5, 2, 15, 8, 20, 1, 12, 3, 7],
    'Price': [250000, 320000, 380000, 270000, 350000, 230000, 420000, 280000, 390000, 340000]
}

df = pd.DataFrame(data)

# 2. Prepare features and target
X = df[['Size', 'Bedrooms', 'Age']]
y = df['Price']

# 3. Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# 4. Train model
model = LinearRegression()
model.fit(X_train, y_train)

# 5. Print equation
print("Multiple Linear Regression Equation:")
print(f"Price = {model.intercept_:.2f}")
for feature, coef in zip(X.columns, model.coef_):
    print(f"        + {coef:.2f} × {feature}")

# 6. Make predictions
y_pred = model.predict(X_test)

# 7. Evaluate
print(f"\nR² Score: {r2_score(y_test, y_pred):.4f}")
print(f"RMSE: ${np.sqrt(mean_squared_error(y_test, y_pred)):,.2f}")

# 8. Predict new house
new_house = pd.DataFrame({
    'Size': [2300],
    'Bedrooms': [4],
    'Age': [5]
})
predicted_price = model.predict(new_house)
print(f"\nPredicted price for new house: ${predicted_price[0]:,.2f}")

Output:

Multiple Linear Regression Equation:
Price = 50000.00
        + 120.50 × Size
        + 15000.00 × Bedrooms
        + -5000.00 × Age

R² Score: 0.9823
RMSE: $8,234.56

Predicted price for new house: $352,150.00

🔧 Linear Regression from Scratch

Understanding the math behind Linear Regression:

import numpy as np

class LinearRegressionFromScratch:
    def __init__(self, learning_rate=0.01, iterations=1000):
        self.learning_rate = learning_rate
        self.iterations = iterations
        self.weights = None
        self.bias = None
        
    def fit(self, X, y):
        # Initialize parameters
        n_samples, n_features = X.shape
        self.weights = np.zeros(n_features)
        self.bias = 0
        
        # Gradient descent
        for _ in range(self.iterations):
            # Predict
            y_pred = np.dot(X, self.weights) + self.bias
            
            # Calculate gradients
            dw = (1/n_samples) * np.dot(X.T, (y_pred - y))
            db = (1/n_samples) * np.sum(y_pred - y)
            
            # Update parameters
            self.weights -= self.learning_rate * dw
            self.bias -= self.learning_rate * db
    
    def predict(self, X):
        return np.dot(X, self.weights) + self.bias

# Example usage
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 6, 8, 10])

model = LinearRegressionFromScratch(learning_rate=0.01, iterations=1000)
model.fit(X, y)

predictions = model.predict(X)
print(f"Weight: {model.weights[0]:.2f}")
print(f"Bias: {model.bias:.2f}")
print(f"Predictions: {predictions}")

📊 Assumptions of Linear Regression

1. Linearity

Relationship between X and y is linear

2. Independence

Observations are independent of each other

3. Homoscedasticity

Constant variance of residuals

4. Normality

Residuals are normally distributed

5. No Multicollinearity

Features are not highly correlated

📈 Evaluation Metrics

1. R² Score (Coefficient of Determination)

Measures how well the model explains variance in the data. Range: 0 to 1 (higher is better)

R² = 1 - (SS_res / SS_tot)

R² = 1: Perfect fit
R² = 0.8: 80% of variance explained (good)
R² = 0: Model is no better than predicting the mean

2. Mean Squared Error (MSE)

Average of squared differences between actual and predicted values

MSE = (1/n) Σ (yᵢ - ŷᵢ)²

3. Root Mean Squared Error (RMSE)

Square root of MSE, in same units as target variable

RMSE = √MSE

4. Mean Absolute Error (MAE)

Average absolute difference between actual and predicted values

MAE = (1/n) Σ |yᵢ - ŷᵢ|

⚠️ Common Pitfalls

Overfitting: Model too complex, fits training data too well but performs poorly on new data
Underfitting: Model too simple, fails to capture underlying patterns
Outliers: Extreme values can significantly affect the regression line
Multicollinearity: Highly correlated features lead to unstable coefficients
Non-linear relationships: Linear regression won't work well for curved patterns

🎯 When to Use Linear Regression?

✅ Use When:

Predicting continuous values
Relationship is roughly linear
You need interpretable results
Dataset is not too large
Features are independent

❌ Don't Use When:

Relationship is non-linear
Target is categorical (use logistic regression)
Many outliers present
Features are highly correlated
Very complex patterns exist

🚀 Advanced Topics

Ridge Regression (L2): Adds penalty to large coefficients
Lasso Regression (L1): Can eliminate features by setting coefficients to zero
Elastic Net: Combines Ridge and Lasso
Polynomial Regression: Fits curved relationships
Regularization: Prevents overfitting