What is Hyperparameter Tuning?
Hyperparameter tuning is the process of finding the optimal configuration for your model. Unlike parameters (learned during training), hyperparameters are set before training begins.
Examples of Hyperparameters:
- Random Forest: n_estimators, max_depth, min_samples_split
- Neural Networks: learning rate, batch size, number of layers
- SVM: C, gamma, kernel type
- KNN: K (number of neighbors), distance metric
🔍 Grid Search
Try all combinations of hyperparameters in a predefined grid.
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# Load data
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
iris.data, iris.target, test_size=0.2, random_state=42
)
# Define parameter grid
param_grid = {
'n_estimators': [50, 100, 200],
'max_depth': [5, 10, None],
'min_samples_split': [2, 5, 10],
'min_samples_leaf': [1, 2, 4]
}
# Total combinations: 3 × 3 × 3 × 3 = 81
# Create grid search
grid_search = GridSearchCV(
estimator=RandomForestClassifier(random_state=42),
param_grid=param_grid,
cv=5, # 5-fold cross-validation
scoring='accuracy', # Metric to optimize
n_jobs=-1, # Use all CPU cores
verbose=2, # Print progress
return_train_score=True
)
# Fit (this trains 81 × 5 = 405 models!)
grid_search.fit(X_train, y_train)
# Best parameters
print("Best parameters:", grid_search.best_params_)
print(f"Best CV score: {grid_search.best_score_:.3f}")
# Test best model
best_model = grid_search.best_estimator_
test_score = best_model.score(X_test, y_test)
print(f"Test score: {test_score:.3f}")
Inspect Results
import pandas as pd
# Convert results to DataFrame
results = pd.DataFrame(grid_search.cv_results_)
# View top 5 configurations
print(results[['params', 'mean_test_score', 'rank_test_score']].head())
# Sort by score
results_sorted = results.sort_values('mean_test_score', ascending=False)
print("\nTop 3 configurations:")
for idx, row in results_sorted.head(3).iterrows():
print(f"{row['params']}: {row['mean_test_score']:.3f}")
🎲 Random Search
Sample random combinations from parameter distributions. Often finds good solutions faster than Grid Search!
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint, uniform
# Define parameter distributions
param_distributions = {
'n_estimators': randint(50, 500), # Random integers
'max_depth': [5, 10, 15, 20, None], # Discrete choices
'min_samples_split': randint(2, 20),
'min_samples_leaf': randint(1, 10),
'max_features': uniform(0.1, 0.9), # Continuous uniform
}
# Random search
random_search = RandomizedSearchCV(
estimator=RandomForestClassifier(random_state=42),
param_distributions=param_distributions,
n_iter=100, # Number of random combinations to try
cv=5,
scoring='accuracy',
n_jobs=-1,
verbose=2,
random_state=42,
return_train_score=True
)
random_search.fit(X_train, y_train)
print("Best parameters:", random_search.best_params_)
print(f"Best CV score: {random_search.best_score_:.3f}")
# Random Search is better when:
# - Large search space (many hyperparameters)
# - Limited computational budget
# - Some hyperparameters more important than others
🎯 Grid Search vs Random Search
| Aspect | Grid Search | Random Search |
|---|---|---|
| Search Strategy | Exhaustive (all combinations) | Random sampling |
| Computational Cost | High (grows exponentially) | Controllable (set n_iter) |
| Best For | Small search space, few params | Large search space, many params |
| Guarantee | Finds global optimum in grid | May miss optimum |
| Efficiency | Can waste time on bad regions | Better exploration of space |
| Continuous Params | Must discretize | Can sample continuously |
🚀 Bayesian Optimization
Bayesian optimization builds a probabilistic model and intelligently selects which hyperparameters to try next. More efficient than random search!
# Install: pip install scikit-optimize
from skopt import BayesSearchCV
from skopt.space import Real, Integer, Categorical
# Define search space
search_space = {
'n_estimators': Integer(50, 500),
'max_depth': Integer(5, 50),
'min_samples_split': Integer(2, 20),
'min_samples_leaf': Integer(1, 10),
'max_features': Real(0.1, 1.0)
}
# Bayesian optimization
bayes_search = BayesSearchCV(
estimator=RandomForestClassifier(random_state=42),
search_spaces=search_space,
n_iter=50, # Number of parameter settings sampled
cv=5,
scoring='accuracy',
n_jobs=-1,
verbose=2,
random_state=42
)
bayes_search.fit(X_train, y_train)
print("Best parameters:", bayes_search.best_params_)
print(f"Best CV score: {bayes_search.best_score_:.3f}")
# Advantages:
# - Learns from previous evaluations
# - Focuses on promising regions
# - More efficient than random search
# - Good for expensive-to-evaluate models
⚡ Halving Grid/Random Search
Start with many configs on small data, successively eliminate bad performers and use more data.
from sklearn.experimental import enable_halving_search_cv
from sklearn.model_selection import HalvingGridSearchCV, HalvingRandomSearchCV
# Halving Grid Search
halving_grid = HalvingGridSearchCV(
estimator=RandomForestClassifier(random_state=42),
param_grid=param_grid,
factor=3, # Reduce candidates by this factor each iteration
resource='n_samples', # Resource to increase (or n_estimators)
max_resources='auto', # Max resource to use
cv=5,
scoring='accuracy',
n_jobs=-1,
verbose=2,
random_state=42
)
halving_grid.fit(X_train, y_train)
print("Best parameters:", halving_grid.best_params_)
print(f"Best score: {halving_grid.best_score_:.3f}")
# How it works:
# Iteration 1: Evaluate all configs on 33% of data
# Iteration 2: Keep best 1/3, use 67% of data
# Iteration 3: Keep best 1/3, use 100% of data
# Much faster than regular GridSearchCV!
📊 Multiple Metrics
# Optimize for one metric, but track others
from sklearn.metrics import make_scorer, f1_score
scoring = {
'accuracy': 'accuracy',
'precision': 'precision_macro',
'recall': 'recall_macro',
'f1': 'f1_macro'
}
grid_search = GridSearchCV(
RandomForestClassifier(random_state=42),
param_grid,
cv=5,
scoring=scoring,
refit='f1', # Optimize for F1, but track all metrics
n_jobs=-1,
return_train_score=True
)
grid_search.fit(X_train, y_train)
# Best model selected based on F1
print(f"Best F1: {grid_search.best_score_:.3f}")
# But can see other metrics too
results = pd.DataFrame(grid_search.cv_results_)
print("\nFor best params:")
best_idx = grid_search.best_index_
for metric in ['accuracy', 'precision', 'recall', 'f1']:
score = results.loc[best_idx, f'mean_test_{metric}']
print(f"{metric}: {score:.3f}")
🎛️ Tuning Different Models
Random Forest
param_grid_rf = {
'n_estimators': [100, 200, 300],
'max_depth': [10, 20, 30, None],
'min_samples_split': [2, 5, 10],
'min_samples_leaf': [1, 2, 4],
'max_features': ['sqrt', 'log2', None],
'bootstrap': [True, False]
}
Gradient Boosting (XGBoost)
param_grid_xgb = {
'n_estimators': [100, 200, 300],
'learning_rate': [0.01, 0.1, 0.3],
'max_depth': [3, 5, 7],
'min_child_weight': [1, 3, 5],
'subsample': [0.6, 0.8, 1.0],
'colsample_bytree': [0.6, 0.8, 1.0]
}
SVM
param_grid_svm = {
'C': [0.1, 1, 10, 100],
'gamma': ['scale', 'auto', 0.001, 0.01, 0.1],
'kernel': ['rbf', 'poly', 'sigmoid']
}
Neural Network
from sklearn.neural_network import MLPClassifier
param_grid_nn = {
'hidden_layer_sizes': [(50,), (100,), (50, 50), (100, 50)],
'activation': ['relu', 'tanh'],
'alpha': [0.0001, 0.001, 0.01],
'learning_rate_init': [0.001, 0.01, 0.1],
'batch_size': [32, 64, 128]
}
💡 Best Practices
- Start broad, then refine: Wide ranges first, then zoom in
- Use log scale: For learning rate: [0.001, 0.01, 0.1, 1.0]
- Domain knowledge: Use reasonable ranges, not arbitrary
- Stratified CV: For classification, especially imbalanced
- Set n_jobs=-1: Use all CPU cores
- Random search first: Quick exploration, then grid search
- Track time: Some configs may be too slow for production
- Save results: Save grid_search object for later analysis
⏱️ Efficient Tuning Strategy
- Start with defaults: Baseline performance
- Random search (wide): 50-100 iterations, broad ranges
- Analyze results: Which parameters matter most?
- Grid search (narrow): Fine-tune around best values
- Validate on test set: Final performance check
# Example workflow
# Step 1: Baseline
rf_default = RandomForestClassifier()
rf_default.fit(X_train, y_train)
baseline = rf_default.score(X_test, y_test)
print(f"Baseline: {baseline:.3f}")
# Step 2: Random search (broad)
random_search = RandomizedSearchCV(...)
random_search.fit(X_train, y_train)
# Step 3: Analyze and refine
best_params = random_search.best_params_
print(f"Random search best: {random_search.best_score_:.3f}")
# Step 4: Grid search (narrow)
param_grid_refined = {
'n_estimators': [best_params['n_estimators'] - 50,
best_params['n_estimators'],
best_params['n_estimators'] + 50],
# ... refine other parameters
}
grid_search = GridSearchCV(...)
grid_search.fit(X_train, y_train)
# Step 5: Final test
final_score = grid_search.best_estimator_.score(X_test, y_test)
print(f"Final test: {final_score:.3f}")
⚠️ Common Pitfalls
- Overfitting to CV: Use nested CV or separate validation set
- Data leakage: Use Pipeline to prevent preprocessing leakage
- Too many params: Exponential combinations, use random/Bayesian
- Ignoring time: Some configs may be too slow for production
- Not saving results: Save best model and all results
- Single test set: Use cross-validation, not single split
💾 Saving and Loading
import joblib
# Save best model
joblib.dump(grid_search.best_estimator_, 'best_model.pkl')
# Save entire grid search object
joblib.dump(grid_search, 'grid_search_results.pkl')
# Load later
loaded_model = joblib.load('best_model.pkl')
loaded_grid = joblib.load('grid_search_results.pkl')
# Use loaded model
predictions = loaded_model.predict(X_new)
🎯 Key Takeaways
- Grid Search tries all combinations exhaustively
- Random Search samples randomly, often more efficient
- Bayesian Optimization learns from previous trials
- Halving Search eliminates bad configs early
- Start broad with random search, refine with grid
- Use cross-validation to avoid overfitting
- Pipeline prevents data leakage during tuning