Hyperparameter Tuning

What is Hyperparameter Tuning?

Hyperparameter tuning is the process of finding the optimal configuration for your model. Unlike parameters (learned during training), hyperparameters are set before training begins.

                Examples of Hyperparameters:
                Random Forest: n_estimators, max_depth, min_samples_split
Neural Networks: learning rate, batch size, number of layers
SVM: C, gamma, kernel type
KNN: K (number of neighbors), distance metric

            

🔍 Grid Search

Try all combinations of hyperparameters in a predefined grid.

from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load data
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
    iris.data, iris.target, test_size=0.2, random_state=42
)

# Define parameter grid
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [5, 10, None],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

# Total combinations: 3 × 3 × 3 × 3 = 81

# Create grid search
grid_search = GridSearchCV(
    estimator=RandomForestClassifier(random_state=42),
    param_grid=param_grid,
    cv=5,                    # 5-fold cross-validation
    scoring='accuracy',      # Metric to optimize
    n_jobs=-1,              # Use all CPU cores
    verbose=2,              # Print progress
    return_train_score=True
)

# Fit (this trains 81 × 5 = 405 models!)
grid_search.fit(X_train, y_train)

# Best parameters
print("Best parameters:", grid_search.best_params_)
print(f"Best CV score: {grid_search.best_score_:.3f}")

# Test best model
best_model = grid_search.best_estimator_
test_score = best_model.score(X_test, y_test)
print(f"Test score: {test_score:.3f}")

Inspect Results

import pandas as pd

# Convert results to DataFrame
results = pd.DataFrame(grid_search.cv_results_)

# View top 5 configurations
print(results[['params', 'mean_test_score', 'rank_test_score']].head())

# Sort by score
results_sorted = results.sort_values('mean_test_score', ascending=False)
print("\nTop 3 configurations:")
for idx, row in results_sorted.head(3).iterrows():
    print(f"{row['params']}: {row['mean_test_score']:.3f}")

🎲 Random Search

Sample random combinations from parameter distributions. Often finds good solutions faster than Grid Search!

from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint, uniform

# Define parameter distributions
param_distributions = {
    'n_estimators': randint(50, 500),           # Random integers
    'max_depth': [5, 10, 15, 20, None],        # Discrete choices
    'min_samples_split': randint(2, 20),
    'min_samples_leaf': randint(1, 10),
    'max_features': uniform(0.1, 0.9),         # Continuous uniform
}

# Random search
random_search = RandomizedSearchCV(
    estimator=RandomForestClassifier(random_state=42),
    param_distributions=param_distributions,
    n_iter=100,              # Number of random combinations to try
    cv=5,
    scoring='accuracy',
    n_jobs=-1,
    verbose=2,
    random_state=42,
    return_train_score=True
)

random_search.fit(X_train, y_train)

print("Best parameters:", random_search.best_params_)
print(f"Best CV score: {random_search.best_score_:.3f}")

# Random Search is better when:
# - Large search space (many hyperparameters)
# - Limited computational budget
# - Some hyperparameters more important than others

🎯 Grid Search vs Random Search

Aspect	Grid Search	Random Search
Search Strategy	Exhaustive (all combinations)	Random sampling
Computational Cost	High (grows exponentially)	Controllable (set n_iter)
Best For	Small search space, few params	Large search space, many params
Guarantee	Finds global optimum in grid	May miss optimum
Efficiency	Can waste time on bad regions	Better exploration of space
Continuous Params	Must discretize	Can sample continuously

🚀 Bayesian Optimization

Bayesian optimization builds a probabilistic model and intelligently selects which hyperparameters to try next. More efficient than random search!

# Install: pip install scikit-optimize
from skopt import BayesSearchCV
from skopt.space import Real, Integer, Categorical

# Define search space
search_space = {
    'n_estimators': Integer(50, 500),
    'max_depth': Integer(5, 50),
    'min_samples_split': Integer(2, 20),
    'min_samples_leaf': Integer(1, 10),
    'max_features': Real(0.1, 1.0)
}

# Bayesian optimization
bayes_search = BayesSearchCV(
    estimator=RandomForestClassifier(random_state=42),
    search_spaces=search_space,
    n_iter=50,               # Number of parameter settings sampled
    cv=5,
    scoring='accuracy',
    n_jobs=-1,
    verbose=2,
    random_state=42
)

bayes_search.fit(X_train, y_train)

print("Best parameters:", bayes_search.best_params_)
print(f"Best CV score: {bayes_search.best_score_:.3f}")

# Advantages:
# - Learns from previous evaluations
# - Focuses on promising regions
# - More efficient than random search
# - Good for expensive-to-evaluate models

⚡ Halving Grid/Random Search

Start with many configs on small data, successively eliminate bad performers and use more data.

from sklearn.experimental import enable_halving_search_cv
from sklearn.model_selection import HalvingGridSearchCV, HalvingRandomSearchCV

# Halving Grid Search
halving_grid = HalvingGridSearchCV(
    estimator=RandomForestClassifier(random_state=42),
    param_grid=param_grid,
    factor=3,                # Reduce candidates by this factor each iteration
    resource='n_samples',    # Resource to increase (or n_estimators)
    max_resources='auto',    # Max resource to use
    cv=5,
    scoring='accuracy',
    n_jobs=-1,
    verbose=2,
    random_state=42
)

halving_grid.fit(X_train, y_train)

print("Best parameters:", halving_grid.best_params_)
print(f"Best score: {halving_grid.best_score_:.3f}")

# How it works:
# Iteration 1: Evaluate all configs on 33% of data
# Iteration 2: Keep best 1/3, use 67% of data
# Iteration 3: Keep best 1/3, use 100% of data

# Much faster than regular GridSearchCV!

📊 Multiple Metrics

# Optimize for one metric, but track others
from sklearn.metrics import make_scorer, f1_score

scoring = {
    'accuracy': 'accuracy',
    'precision': 'precision_macro',
    'recall': 'recall_macro',
    'f1': 'f1_macro'
}

grid_search = GridSearchCV(
    RandomForestClassifier(random_state=42),
    param_grid,
    cv=5,
    scoring=scoring,
    refit='f1',              # Optimize for F1, but track all metrics
    n_jobs=-1,
    return_train_score=True
)

grid_search.fit(X_train, y_train)

# Best model selected based on F1
print(f"Best F1: {grid_search.best_score_:.3f}")

# But can see other metrics too
results = pd.DataFrame(grid_search.cv_results_)
print("\nFor best params:")
best_idx = grid_search.best_index_
for metric in ['accuracy', 'precision', 'recall', 'f1']:
    score = results.loc[best_idx, f'mean_test_{metric}']
    print(f"{metric}: {score:.3f}")

🎛️ Tuning Different Models

Random Forest

param_grid_rf = {
    'n_estimators': [100, 200, 300],
    'max_depth': [10, 20, 30, None],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
    'max_features': ['sqrt', 'log2', None],
    'bootstrap': [True, False]
}

Gradient Boosting (XGBoost)

param_grid_xgb = {
    'n_estimators': [100, 200, 300],
    'learning_rate': [0.01, 0.1, 0.3],
    'max_depth': [3, 5, 7],
    'min_child_weight': [1, 3, 5],
    'subsample': [0.6, 0.8, 1.0],
    'colsample_bytree': [0.6, 0.8, 1.0]
}

SVM

param_grid_svm = {
    'C': [0.1, 1, 10, 100],
    'gamma': ['scale', 'auto', 0.001, 0.01, 0.1],
    'kernel': ['rbf', 'poly', 'sigmoid']
}

Neural Network

from sklearn.neural_network import MLPClassifier

param_grid_nn = {
    'hidden_layer_sizes': [(50,), (100,), (50, 50), (100, 50)],
    'activation': ['relu', 'tanh'],
    'alpha': [0.0001, 0.001, 0.01],
    'learning_rate_init': [0.001, 0.01, 0.1],
    'batch_size': [32, 64, 128]
}

💡 Best Practices

Start broad, then refine: Wide ranges first, then zoom in
Use log scale: For learning rate: [0.001, 0.01, 0.1, 1.0]
Domain knowledge: Use reasonable ranges, not arbitrary
Stratified CV: For classification, especially imbalanced
Set n_jobs=-1: Use all CPU cores
Random search first: Quick exploration, then grid search
Track time: Some configs may be too slow for production
Save results: Save grid_search object for later analysis

⏱️ Efficient Tuning Strategy

Start with defaults: Baseline performance
Random search (wide): 50-100 iterations, broad ranges
Analyze results: Which parameters matter most?
Grid search (narrow): Fine-tune around best values
Validate on test set: Final performance check

# Example workflow
# Step 1: Baseline
rf_default = RandomForestClassifier()
rf_default.fit(X_train, y_train)
baseline = rf_default.score(X_test, y_test)
print(f"Baseline: {baseline:.3f}")

# Step 2: Random search (broad)
random_search = RandomizedSearchCV(...)
random_search.fit(X_train, y_train)

# Step 3: Analyze and refine
best_params = random_search.best_params_
print(f"Random search best: {random_search.best_score_:.3f}")

# Step 4: Grid search (narrow)
param_grid_refined = {
    'n_estimators': [best_params['n_estimators'] - 50,
                     best_params['n_estimators'],
                     best_params['n_estimators'] + 50],
    # ... refine other parameters
}
grid_search = GridSearchCV(...)
grid_search.fit(X_train, y_train)

# Step 5: Final test
final_score = grid_search.best_estimator_.score(X_test, y_test)
print(f"Final test: {final_score:.3f}")

⚠️ Common Pitfalls

Overfitting to CV: Use nested CV or separate validation set
Data leakage: Use Pipeline to prevent preprocessing leakage
Too many params: Exponential combinations, use random/Bayesian
Ignoring time: Some configs may be too slow for production
Not saving results: Save best model and all results
Single test set: Use cross-validation, not single split

💾 Saving and Loading

import joblib

# Save best model
joblib.dump(grid_search.best_estimator_, 'best_model.pkl')

# Save entire grid search object
joblib.dump(grid_search, 'grid_search_results.pkl')

# Load later
loaded_model = joblib.load('best_model.pkl')
loaded_grid = joblib.load('grid_search_results.pkl')

# Use loaded model
predictions = loaded_model.predict(X_new)

🎯 Key Takeaways

Grid Search tries all combinations exhaustively
Random Search samples randomly, often more efficient
Bayesian Optimization learns from previous trials
Halving Search eliminates bad configs early
Start broad with random search, refine with grid
Use cross-validation to avoid overfitting
Pipeline prevents data leakage during tuning