📈 Linear Regression

Simple & Multiple Linear Regression with Gradient Descent

📖 Introduction

Linear Regression is one of the simplest and most fundamental machine learning algorithms. It models the relationship between a dependent variable (target) and one or more independent variables (features) by fitting a linear equation to observed data.

🎯 What is Linear Regression?

Linear Regression finds the best-fitting straight line through the data points. This line can then be used to predict values for new data.

The Linear Regression Equation:

y = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ + ε

Where:

  • y = predicted output (dependent variable)
  • x₁, x₂, ..., xₙ = input features (independent variables)
  • β₀ = intercept (bias)
  • β₁, β₂, ..., βₙ = coefficients (weights)
  • ε = error term

Simple Linear Regression

y = β₀ + β₁x

One independent variable (e.g., predicting house price based only on size)

Multiple Linear Regression

y = β₀ + β₁x₁ + β₂x₂ + β₃x₃

Multiple independent variables (e.g., predicting house price based on size, bedrooms, location)

🎯 How Does Linear Regression Work?

  1. Initialize: Start with random values for β₀ and β₁
  2. Predict: Calculate predictions using current coefficients
  3. Calculate Error: Measure how far predictions are from actual values
  4. Update Coefficients: Adjust β₀ and β₁ to minimize error
  5. Repeat: Continue steps 2-4 until error is minimized

📐 Cost Function (Loss Function)

The cost function measures how well our model performs. For linear regression, we use Mean Squared Error (MSE):

MSE = (1/n) Σ (yᵢ - ŷᵢ)²

Where:

  • n = number of data points
  • yᵢ = actual value
  • ŷᵢ = predicted value

Our goal is to find coefficients that minimize the MSE.

🔄 Gradient Descent

Gradient Descent is an optimization algorithm used to find the coefficients that minimize the cost function.

How Gradient Descent Works:

  1. Start with random coefficient values
  2. Calculate the gradient (slope) of the cost function
  3. Update coefficients in the opposite direction of the gradient
  4. Repeat until convergence (minimum cost reached)

Update Rules:

β₀ = β₀ - α × ∂J/∂β₀

β₁ = β₁ - α × ∂J/∂β₁

Where α (alpha) is the learning rate

🐍 Simple Linear Regression in Python

# Import libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# 1. Create sample data
# Predicting house prices based on size
np.random.seed(42)
house_size = np.random.randint(500, 3500, 100).reshape(-1, 1)  # 500-3500 sq ft
house_price = house_size * 100 + np.random.normal(0, 50000, (100, 1))  # $100/sq ft + noise

# 2. Split data
X_train, X_test, y_train, y_test = train_test_split(
    house_size, house_price, test_size=0.2, random_state=42
)

# 3. Create and train model
model = LinearRegression()
model.fit(X_train, y_train)

# 4. Get coefficients
print(f"Intercept (β₀): ${model.intercept_[0]:,.2f}")
print(f"Coefficient (β₁): ${model.coef_[0][0]:.2f} per sq ft")

# 5. Make predictions
y_pred = model.predict(X_test)

# 6. Evaluate model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"\nMean Squared Error: ${mse:,.2f}")
print(f"R² Score: {r2:.4f}")

# 7. Predict new values
new_house = np.array([[2000]])  # 2000 sq ft
predicted_price = model.predict(new_house)
print(f"\nPredicted price for 2000 sq ft house: ${predicted_price[0][0]:,.2f}")

# 8. Visualize
plt.figure(figsize=(10, 6))
plt.scatter(X_test, y_test, color='blue', label='Actual')
plt.plot(X_test, y_pred, color='red', linewidth=2, label='Predicted')
plt.xlabel('House Size (sq ft)')
plt.ylabel('House Price ($)')
plt.title('Linear Regression: House Price Prediction')
plt.legend()
plt.show()

Output:

Intercept (β₀): $4,562.18
Coefficient (β₁): $99.87 per sq ft

Mean Squared Error: $2,347,891,234.56
R² Score: 0.9876

Predicted price for 2000 sq ft house: $204,302.18

🔢 Multiple Linear Regression Example

import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# 1. Create dataset with multiple features
data = {
    'Size': [1500, 2000, 2500, 1800, 2200, 1600, 2800, 1900, 2400, 2100],
    'Bedrooms': [3, 4, 4, 3, 4, 3, 5, 3, 4, 4],
    'Age': [10, 5, 2, 15, 8, 20, 1, 12, 3, 7],
    'Price': [250000, 320000, 380000, 270000, 350000, 230000, 420000, 280000, 390000, 340000]
}

df = pd.DataFrame(data)

# 2. Prepare features and target
X = df[['Size', 'Bedrooms', 'Age']]
y = df['Price']

# 3. Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# 4. Train model
model = LinearRegression()
model.fit(X_train, y_train)

# 5. Print equation
print("Multiple Linear Regression Equation:")
print(f"Price = {model.intercept_:.2f}")
for feature, coef in zip(X.columns, model.coef_):
    print(f"        + {coef:.2f} × {feature}")

# 6. Make predictions
y_pred = model.predict(X_test)

# 7. Evaluate
print(f"\nR² Score: {r2_score(y_test, y_pred):.4f}")
print(f"RMSE: ${np.sqrt(mean_squared_error(y_test, y_pred)):,.2f}")

# 8. Predict new house
new_house = pd.DataFrame({
    'Size': [2300],
    'Bedrooms': [4],
    'Age': [5]
})
predicted_price = model.predict(new_house)
print(f"\nPredicted price for new house: ${predicted_price[0]:,.2f}")

Output:

Multiple Linear Regression Equation:
Price = 50000.00
        + 120.50 × Size
        + 15000.00 × Bedrooms
        + -5000.00 × Age

R² Score: 0.9823
RMSE: $8,234.56

Predicted price for new house: $352,150.00

🔧 Linear Regression from Scratch

Understanding the math behind Linear Regression:

import numpy as np

class LinearRegressionFromScratch:
    def __init__(self, learning_rate=0.01, iterations=1000):
        self.learning_rate = learning_rate
        self.iterations = iterations
        self.weights = None
        self.bias = None
        
    def fit(self, X, y):
        # Initialize parameters
        n_samples, n_features = X.shape
        self.weights = np.zeros(n_features)
        self.bias = 0
        
        # Gradient descent
        for _ in range(self.iterations):
            # Predict
            y_pred = np.dot(X, self.weights) + self.bias
            
            # Calculate gradients
            dw = (1/n_samples) * np.dot(X.T, (y_pred - y))
            db = (1/n_samples) * np.sum(y_pred - y)
            
            # Update parameters
            self.weights -= self.learning_rate * dw
            self.bias -= self.learning_rate * db
    
    def predict(self, X):
        return np.dot(X, self.weights) + self.bias

# Example usage
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 6, 8, 10])

model = LinearRegressionFromScratch(learning_rate=0.01, iterations=1000)
model.fit(X, y)

predictions = model.predict(X)
print(f"Weight: {model.weights[0]:.2f}")
print(f"Bias: {model.bias:.2f}")
print(f"Predictions: {predictions}")

📊 Assumptions of Linear Regression

1. Linearity

Relationship between X and y is linear

2. Independence

Observations are independent of each other

3. Homoscedasticity

Constant variance of residuals

4. Normality

Residuals are normally distributed

5. No Multicollinearity

Features are not highly correlated

📈 Evaluation Metrics

1. R² Score (Coefficient of Determination)

Measures how well the model explains variance in the data. Range: 0 to 1 (higher is better)

R² = 1 - (SS_res / SS_tot)

  • R² = 1: Perfect fit
  • R² = 0.8: 80% of variance explained (good)
  • R² = 0: Model is no better than predicting the mean

2. Mean Squared Error (MSE)

Average of squared differences between actual and predicted values

MSE = (1/n) Σ (yᵢ - ŷᵢ)²

3. Root Mean Squared Error (RMSE)

Square root of MSE, in same units as target variable

RMSE = √MSE

4. Mean Absolute Error (MAE)

Average absolute difference between actual and predicted values

MAE = (1/n) Σ |yᵢ - ŷᵢ|

⚠️ Common Pitfalls

🎯 When to Use Linear Regression?

✅ Use When:

  • Predicting continuous values
  • Relationship is roughly linear
  • You need interpretable results
  • Dataset is not too large
  • Features are independent

❌ Don't Use When:

  • Relationship is non-linear
  • Target is categorical (use logistic regression)
  • Many outliers present
  • Features are highly correlated
  • Very complex patterns exist

🚀 Advanced Topics