🎲 Support Vector Machines

Find the optimal decision boundary

What are Support Vector Machines?

SVMs find the optimal hyperplane that separates different classes with the maximum margin. They're powerful for both linear and non-linear classification problems.

Key Concepts:

  • Hyperplane: Decision boundary that separates classes
  • Support Vectors: Data points closest to the hyperplane
  • Margin: Distance between hyperplane and support vectors
  • Kernel Trick: Transform data for non-linear problems

📊 Linear SVM

How It Works

SVM finds the hyperplane that maximizes the margin between classes:

from sklearn.svm import SVC
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import numpy as np
import matplotlib.pyplot as plt

# Generate linearly separable data
X, y = make_classification(n_samples=100, n_features=2, n_redundant=0, 
                           n_informative=2, n_clusters_per_class=1, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale features (important for SVM!)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train Linear SVM
svm = SVC(kernel='linear', C=1.0, random_state=42)
svm.fit(X_train_scaled, y_train)

# Evaluate
accuracy = svm.score(X_test_scaled, y_test)
print(f"Accuracy: {accuracy:.3f}")

# Support vectors
print(f"Number of support vectors: {len(svm.support_vectors_)}")

Visualizing Decision Boundary

def plot_decision_boundary(model, X, y):
    # Create mesh
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),
                         np.arange(y_min, y_max, 0.02))
    
    # Predict
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    
    # Plot
    plt.contourf(xx, yy, Z, alpha=0.4, cmap='RdYlBu')
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap='RdYlBu', edgecolors='black')
    
    # Highlight support vectors
    plt.scatter(model.support_vectors_[:, 0], 
                model.support_vectors_[:, 1],
                s=200, linewidth=1, facecolors='none', edgecolors='k')
    plt.title('SVM Decision Boundary')
    plt.show()

plot_decision_boundary(svm, X_train_scaled, y_train)

⚙️ The C Parameter

C controls the trade-off between smooth decision boundary and classifying training points correctly.

# Small C: Larger margin, more misclassifications (underfitting)
svm_soft = SVC(kernel='linear', C=0.01)
svm_soft.fit(X_train_scaled, y_train)
print(f"C=0.01 Accuracy: {svm_soft.score(X_test_scaled, y_test):.3f}")

# Large C: Smaller margin, fewer misclassifications (risk overfitting)
svm_hard = SVC(kernel='linear', C=100)
svm_hard.fit(X_train_scaled, y_train)
print(f"C=100 Accuracy: {svm_hard.score(X_test_scaled, y_test):.3f}")

# C interpretation:
# - Small C (0.01-0.1): More regularization, simpler model
# - Medium C (1): Balanced (default)
# - Large C (10-100): Less regularization, complex model

🌀 Non-Linear SVM with Kernels

The kernel trick transforms data into higher dimensions to make it linearly separable.

Polynomial Kernel

# Generate non-linear data
from sklearn.datasets import make_circles
X, y = make_circles(n_samples=200, noise=0.1, factor=0.5, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Polynomial kernel
svm_poly = SVC(kernel='poly', degree=3, C=1.0)
svm_poly.fit(X_train, y_train)
print(f"Polynomial Kernel Accuracy: {svm_poly.score(X_test, y_test):.3f}")

RBF (Radial Basis Function) Kernel

# RBF kernel (most popular for non-linear problems)
svm_rbf = SVC(kernel='rbf', C=1.0, gamma='scale')
svm_rbf.fit(X_train, y_train)
print(f"RBF Kernel Accuracy: {svm_rbf.score(X_test, y_test):.3f}")

# Gamma parameter:
# - Small gamma: Far-reaching influence (smoother boundary)
# - Large gamma: Limited influence (complex boundary, risk overfitting)

# Try different gamma values
for gamma in [0.001, 0.01, 0.1, 1, 10]:
    svm = SVC(kernel='rbf', C=1.0, gamma=gamma)
    svm.fit(X_train, y_train)
    print(f"Gamma={gamma}: {svm.score(X_test, y_test):.3f}")

Sigmoid Kernel

# Sigmoid kernel (similar to neural network)
svm_sigmoid = SVC(kernel='sigmoid', C=1.0)
svm_sigmoid.fit(X_train, y_train)
print(f"Sigmoid Kernel Accuracy: {svm_sigmoid.score(X_test, y_test):.3f}")

📊 Kernel Comparison

Kernel Use Case Parameters Pros Cons
Linear Linearly separable data C Fast, interpretable Limited to linear problems
Polynomial Polynomial relationships C, degree Flexible Many parameters, slow
RBF Most non-linear problems C, gamma Very flexible, powerful Risk overfitting, slower
Sigmoid Similar to neural nets C, gamma Good for some problems Less popular, unstable

🎯 Multiclass Classification

from sklearn.datasets import load_iris

# Load multiclass dataset
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
    iris.data, iris.target, test_size=0.2, random_state=42
)

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# SVM handles multiclass automatically (one-vs-one strategy)
svm = SVC(kernel='rbf', C=1.0, gamma='scale')
svm.fit(X_train_scaled, y_train)

# Evaluate
accuracy = svm.score(X_test_scaled, y_test)
print(f"Multiclass Accuracy: {accuracy:.3f}")

# Predict
y_pred = svm.predict(X_test_scaled)

# Confusion matrix
from sklearn.metrics import confusion_matrix, classification_report
print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=iris.target_names))

⚡ SVM Regression (SVR)

SVM can also be used for regression problems!

from sklearn.svm import SVR
from sklearn.datasets import make_regression

# Generate regression data
X, y = make_regression(n_samples=100, n_features=1, noise=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Train SVR
svr = SVR(kernel='rbf', C=100, gamma=0.1, epsilon=0.1)
svr.fit(X_train, y_train)

# Predict
y_pred = svr.predict(X_test)

# Evaluate
from sklearn.metrics import mean_squared_error, r2_score
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
r2 = r2_score(y_test, y_pred)
print(f"RMSE: {rmse:.3f}")
print(f"R² Score: {r2:.3f}")

# Plot
plt.scatter(X_test, y_test, color='blue', label='Actual')
plt.scatter(X_test, y_pred, color='red', label='Predicted')
plt.legend()
plt.title('Support Vector Regression')
plt.show()

🔧 Hyperparameter Tuning

from sklearn.model_selection import GridSearchCV

# Define parameter grid
param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': [0.001, 0.01, 0.1, 1, 'scale', 'auto'],
    'kernel': ['rbf', 'poly', 'sigmoid']
}

# Grid search
svm = SVC()
grid_search = GridSearchCV(svm, param_grid, cv=5, scoring='accuracy', n_jobs=-1, verbose=1)
grid_search.fit(X_train_scaled, y_train)

# Best parameters
print("Best parameters:", grid_search.best_params_)
print(f"Best CV score: {grid_search.best_score_:.3f}")

# Use best model
best_svm = grid_search.best_estimator_
test_score = best_svm.score(X_test_scaled, y_test)
print(f"Test accuracy: {test_score:.3f}")

⚠️ Important Considerations

1. Feature Scaling is Critical

# SVM is sensitive to feature scales
# Always scale your features!

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)  # Use same scaler!

# Without scaling: Poor performance
svm_no_scale = SVC().fit(X_train, y_train)
print(f"No scaling: {svm_no_scale.score(X_test, y_test):.3f}")

# With scaling: Much better
svm_scaled = SVC().fit(X_train_scaled, y_train)
print(f"With scaling: {svm_scaled.score(X_test_scaled, y_test):.3f}")

2. Computational Complexity

3. When to Use SVM

💡 Practical Tips

🎯 Key Takeaways