📖 Introduction
Mathematics is the foundation of machine learning. Understanding key mathematical concepts will help you grasp ML algorithms deeply, debug models effectively, and develop new techniques. This guide covers the essential math you need.
🎯 Why Math Matters in ML
- Linear Algebra: Understanding data representation, transformations, and neural networks
- Calculus: Optimization, gradient descent, backpropagation
- Probability: Uncertainty quantification, Bayesian methods
- Statistics: Hypothesis testing, confidence intervals, model evaluation
🔢 Linear Algebra
Scalars, Vectors, and Matrices
Scalar: A single number (e.g., 5, 3.14)
Vector: An array of numbers [1, 2, 3]
Matrix: A 2D array of numbers
import numpy as np
# Scalar
scalar = 5
# Vector (1D array)
vector = np.array([1, 2, 3, 4])
print("Vector:", vector)
print("Shape:", vector.shape) # (4,)
# Matrix (2D array)
matrix = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
print("\nMatrix:\n", matrix)
print("Shape:", matrix.shape) # (3, 3)
# Tensor (3D+ array)
tensor = np.array([[[1, 2], [3, 4]],
[[5, 6], [7, 8]]])
print("\nTensor shape:", tensor.shape) # (2, 2, 2)
Vector Operations
# Vector addition
v1 = np.array([1, 2, 3])
v2 = np.array([4, 5, 6])
print("Addition:", v1 + v2) # [5, 7, 9]
# Scalar multiplication
print("Scalar multiply:", 3 * v1) # [3, 6, 9]
# Dot product (inner product)
dot_product = np.dot(v1, v2)
print("Dot product:", dot_product) # 1*4 + 2*5 + 3*6 = 32
# Vector magnitude (norm)
magnitude = np.linalg.norm(v1)
print("Magnitude:", magnitude) # sqrt(1² + 2² + 3²) = 3.74
Matrix Operations
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
# Matrix addition
print("A + B:\n", A + B)
# Matrix multiplication
print("\nA @ B:\n", A @ B)
# [[1*5+2*7, 1*6+2*8],
# [3*5+4*7, 3*6+4*8]]
# Element-wise multiplication
print("\nA * B (element-wise):\n", A * B)
# Transpose
print("\nA transpose:\n", A.T)
# Inverse
A_inv = np.linalg.inv(A)
print("\nA inverse:\n", A_inv)
print("A @ A_inv (should be identity):\n", A @ A_inv)
Eigenvalues and Eigenvectors
Important for PCA, dimensionality reduction, and understanding neural network dynamics.
A = np.array([[4, 2], [1, 3]])
# Calculate eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)
print("Eigenvalues:", eigenvalues)
print("Eigenvectors:\n", eigenvectors)
# Verify: A @ v = λ @ v
for i in range(len(eigenvalues)):
v = eigenvectors[:, i]
lambda_v = eigenvalues[i]
print(f"\nA @ v{i} =", A @ v)
print(f"λ{i} * v{i} =", lambda_v * v)
📊 Calculus
Derivatives
Measures how a function changes as its input changes. Critical for optimization!
Derivative Definition:
f'(x) = lim(h→0) [f(x+h) - f(x)] / h
Common Derivatives:
- f(x) = x² → f'(x) = 2x
- f(x) = x³ → f'(x) = 3x²
- f(x) = eˣ → f'(x) = eˣ
- f(x) = ln(x) → f'(x) = 1/x
- f(x) = sin(x) → f'(x) = cos(x)
Chain Rule
Essential for backpropagation in neural networks!
If h(x) = f(g(x)), then h'(x) = f'(g(x)) × g'(x)
import matplotlib.pyplot as plt
# Example: f(x) = x² and its derivative f'(x) = 2x
x = np.linspace(-3, 3, 100)
y = x**2
dy_dx = 2*x
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.plot(x, y, label='f(x) = x²')
plt.xlabel('x')
plt.ylabel('f(x)')
plt.title('Function')
plt.legend()
plt.grid(True)
plt.subplot(1, 2, 2)
plt.plot(x, dy_dx, label="f'(x) = 2x", color='red')
plt.xlabel('x')
plt.ylabel("f'(x)")
plt.title('Derivative')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
Partial Derivatives
Derivatives of functions with multiple variables. Used in gradient descent!
For f(x, y) = x² + 2xy + y²:
∂f/∂x = 2x + 2y
∂f/∂y = 2x + 2y
# Gradient = vector of partial derivatives
def f(x, y):
return x**2 + 2*x*y + y**2
def gradient(x, y):
df_dx = 2*x + 2*y
df_dy = 2*x + 2*y
return np.array([df_dx, df_dy])
# Example
x, y = 1.0, 2.0
print(f"f({x}, {y}) = {f(x, y)}")
print(f"Gradient at ({x}, {y}): {gradient(x, y)}")
Gradient Descent
The optimization algorithm that powers machine learning!
def gradient_descent_example():
# Minimize f(x) = x²
x = 10.0 # Starting point
learning_rate = 0.1
iterations = 20
history = [x]
for i in range(iterations):
# Calculate gradient (derivative)
gradient = 2 * x
# Update x
x = x - learning_rate * gradient
history.append(x)
print(f"Iteration {i+1}: x = {x:.4f}, f(x) = {x**2:.4f}")
# Plot convergence
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.plot(history, marker='o')
plt.xlabel('Iteration')
plt.ylabel('x value')
plt.title('Parameter Convergence')
plt.grid(True)
plt.subplot(1, 2, 2)
plt.plot([h**2 for h in history], marker='o', color='red')
plt.xlabel('Iteration')
plt.ylabel('f(x) = x²')
plt.title('Loss Convergence')
plt.grid(True)
plt.tight_layout()
plt.show()
gradient_descent_example()
🎲 Probability & Statistics
Probability Basics
- Probability: P(A) = number of favorable outcomes / total outcomes
- Range: 0 ≤ P(A) ≤ 1
- Sum Rule: P(A or B) = P(A) + P(B) - P(A and B)
- Product Rule: P(A and B) = P(A) × P(B|A)
Probability Distributions
import scipy.stats as stats
# Normal (Gaussian) Distribution
mean, std = 0, 1
x = np.linspace(-4, 4, 100)
pdf = stats.norm.pdf(x, mean, std)
plt.figure(figsize=(12, 4))
plt.subplot(1, 3, 1)
plt.plot(x, pdf)
plt.title('Normal Distribution\nN(0, 1)')
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.grid(True)
# Binomial Distribution
n, p = 10, 0.5
x_binom = np.arange(0, n+1)
pmf = stats.binom.pmf(x_binom, n, p)
plt.subplot(1, 3, 2)
plt.bar(x_binom, pmf)
plt.title('Binomial Distribution\nn=10, p=0.5')
plt.xlabel('Number of Successes')
plt.ylabel('Probability')
plt.grid(True)
# Poisson Distribution
lambda_val = 3
x_poisson = np.arange(0, 15)
pmf_poisson = stats.poisson.pmf(x_poisson, lambda_val)
plt.subplot(1, 3, 3)
plt.bar(x_poisson, pmf_poisson)
plt.title('Poisson Distribution\nλ=3')
plt.xlabel('Number of Events')
plt.ylabel('Probability')
plt.grid(True)
plt.tight_layout()
plt.show()
Bayes' Theorem
Foundation of Bayesian ML and Naive Bayes classifier!
P(A|B) = P(B|A) × P(A) / P(B)
- P(A|B) = Posterior probability
- P(B|A) = Likelihood
- P(A) = Prior probability
- P(B) = Evidence
Key Statistical Measures
data = np.array([2, 4, 4, 4, 5, 5, 7, 9])
# Central Tendency
mean = np.mean(data)
median = np.median(data)
mode = stats.mode(data).mode
print(f"Mean: {mean}")
print(f"Median: {median}")
print(f"Mode: {mode}")
# Spread
variance = np.var(data)
std_dev = np.std(data)
range_val = np.max(data) - np.min(data)
print(f"\nVariance: {variance}")
print(f"Standard Deviation: {std_dev}")
print(f"Range: {range_val}")
# Quartiles
q1 = np.percentile(data, 25)
q2 = np.percentile(data, 50) # median
q3 = np.percentile(data, 75)
print(f"\nQ1 (25th percentile): {q1}")
print(f"Q2 (50th percentile): {q2}")
print(f"Q3 (75th percentile): {q3}")
print(f"IQR (Interquartile Range): {q3 - q1}")
Correlation
# Generate correlated data
np.random.seed(42)
x = np.random.randn(100)
y_positive = x + np.random.randn(100) * 0.5 # Positive correlation
y_negative = -x + np.random.randn(100) * 0.5 # Negative correlation
y_none = np.random.randn(100) # No correlation
# Calculate correlation coefficients
corr_pos = np.corrcoef(x, y_positive)[0, 1]
corr_neg = np.corrcoef(x, y_negative)[0, 1]
corr_none = np.corrcoef(x, y_none)[0, 1]
# Plot
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
axes[0].scatter(x, y_positive)
axes[0].set_title(f'Positive Correlation\nr = {corr_pos:.2f}')
axes[0].set_xlabel('x')
axes[0].set_ylabel('y')
axes[1].scatter(x, y_negative)
axes[1].set_title(f'Negative Correlation\nr = {corr_neg:.2f}')
axes[1].set_xlabel('x')
axes[1].set_ylabel('y')
axes[2].scatter(x, y_none)
axes[2].set_title(f'No Correlation\nr = {corr_none:.2f}')
axes[2].set_xlabel('x')
axes[2].set_ylabel('y')
plt.tight_layout()
plt.show()
🎯 Applied Math in ML Algorithms
Linear Regression
Math: Matrix operations, least squares
Formula: β = (XᵀX)⁻¹Xᵀy
Gradient Descent
Math: Calculus (derivatives, chain rule)
Formula: θ = θ - α∇J(θ)
Logistic Regression
Math: Probability, sigmoid function
Formula: σ(z) = 1 / (1 + e⁻ᶻ)
PCA
Math: Eigenvalues, eigenvectors
Use: Dimensionality reduction
Neural Networks
Math: Matrix multiplication, calculus
Use: Backpropagation, weight updates
Naive Bayes
Math: Bayes' theorem, probability
Use: Classification with probabilities
💡 Learning Tips
- Start with basics: Don't try to learn everything at once
- Practice with code: Implement mathematical concepts in Python
- Visualize: Plot functions, gradients, distributions
- Connect to ML: See how math applies to real algorithms
- Use resources: Khan Academy, 3Blue1Brown, MIT OpenCourseWare
📚 Recommended Resources
- Books: "Mathematics for Machine Learning" by Deisenroth, Faisal, Ong
- Videos: 3Blue1Brown's Essence of Linear Algebra & Calculus series
- Courses: Khan Academy (Linear Algebra, Calculus, Statistics)
- Interactive: Seeing Theory (visualizing probability and statistics)