TimeFlies includes comprehensive hyperparameter tuning capabilities with three optimization methods: grid search, random search, and Bayesian optimization using Optuna.
Edit your configs/default.yaml
and set:
hyperparameter_tuning:
enabled: true # Enable hyperparameter optimization
method: "bayesian" # Options: "grid", "random", "bayesian"
n_trials: 20
optimization_metric: "f1_score" # Options: "accuracy", "f1_score", "precision", "recall", "roc_auc"
# Use your existing default.yaml configuration
timeflies tune
# Or specify a custom config file
timeflies tune configs/my_custom_config.yaml
Results are saved in outputs/hyperparameter_tuning/
with:
Hyperparameter tuning is configured directly in your configs/default.yaml
. This eliminates duplication and uses your existing project settings as the base.
hyperparameter_tuning:
enabled: true
method: "bayesian" # "grid", "random", or "bayesian"
n_trials: 20 # For random/bayesian (ignored for grid)
optimization_metric: "f1_score" # "accuracy", "f1_score", "precision", "recall", "roc_auc"
# Speed optimizations for hyperparameter search
search_optimizations:
data:
sampling:
samples: 1000 # Use subset for faster trials
variables: 500 # Use top genes for speed
with_eda: false # Skip EDA during search
with_analysis: false # Skip analysis during search
interpret: false # Skip SHAP during search
model:
training:
epochs: 50 # Reduced epochs for search
early_stopping_patience: 5
# Define hyperparameters to tune for each model type
model_hyperparams:
CNN:
learning_rate: [0.0001, 0.001, 0.01]
batch_size: [16, 32, 64]
epochs: [50, 75, 100]
# CNN architecture variants
cnn_variants:
- name: "standard"
filters: [32]
kernel_sizes: [3]
pool_sizes: [2]
- name: "larger_filters"
filters: [64]
kernel_sizes: [3]
pool_sizes: [2]
xgboost:
n_estimators: [100, 200, 300]
max_depth: [6, 9, 12]
learning_rate: [0.01, 0.1, 0.2]
hyperparameter_tuning:
method: "grid"
# n_trials ignored - explores all combinations
hyperparameter_tuning:
method: "random"
n_trials: 50 # Number of random samples
hyperparameter_tuning:
method: "bayesian"
n_trials: 30 # Usually needs fewer trials than random
Choose the best metric for your research goals:
# For balanced aging datasets
hyperparameter_tuning:
optimization_metric: "accuracy"
# For imbalanced age groups
hyperparameter_tuning:
optimization_metric: "f1_score"
# For probabilistic age modeling
hyperparameter_tuning:
optimization_metric: "roc_auc"
For CNN models, you can explore different architectures along with hyperparameters:
model_hyperparams:
CNN:
learning_rate: [0.001, 0.01]
batch_size: [16, 32, 64]
# Architecture variants based on your existing CNN structure
cnn_variants:
- name: "lightweight"
filters: [16] # Smaller filters
kernel_sizes: [3]
pool_sizes: [2]
- name: "standard"
filters: [32] # Your current default
kernel_sizes: [3]
pool_sizes: [2]
- name: "larger_filters"
filters: [64] # More filters
kernel_sizes: [3]
pool_sizes: [2]
- name: "larger_kernel"
filters: [32]
kernel_sizes: [5] # Larger receptive field
pool_sizes: [null] # No pooling
Each variant is combined with all hyperparameter combinations.
After hyperparameter tuning, use the best configurations for production training:
from common.core.model_queue import ModelQueueManager
# Create a model queue from hyperparameter results
manager = ModelQueueManager.from_hyperparameter_results(
hyperparameter_results_dir="outputs/hyperparameter_tuning/search_2024-08-25_16-30-45",
top_n=5 # Use top 5 configurations
)
# Run production training with full analysis
manager.run_production_training(
enable_full_analysis=True,
enable_interpretation=True
)
Hyperparameter searches automatically save checkpoints:
# Resumes from checkpoint if available
timeflies tune
# Force fresh start
timeflies tune --no-resume
You can define any hyperparameter that your model accepts:
model_hyperparams:
CNN:
# Training hyperparameters
learning_rate: [0.0001, 0.001, 0.01]
batch_size: [16, 32, 64]
epochs: [25, 50, 75, 100]
# Early stopping
early_stopping_patience: [5, 8, 10]
# Optimizer settings
optimizer: ["adam", "sgd", "rmsprop"]
# Model architecture (for variants)
dropout_rate: [0.2, 0.3, 0.5]
Configure hyperparameters for different model types:
model_hyperparams:
CNN:
learning_rate: [0.001, 0.01]
batch_size: [16, 32, 64]
xgboost:
n_estimators: [100, 200, 300]
max_depth: [6, 9, 12]
learning_rate: [0.01, 0.1, 0.2]
random_forest:
n_estimators: [100, 200]
max_depth: [10, 20, null]
min_samples_split: [2, 5]
Only the model type specified in data.model
will be tuned.
Hyperparameter tuning results are organized in timestamped directories:
outputs/[project]/hyperparameter_tuning/
└── timeflies_hyperparameter_search_2024-08-25_16-30-45/
├── hyperparameter_search_report.md # Comprehensive results report
├── hyperparameter_search_metrics.csv # Metrics for all trials
├── checkpoint.json # Resume checkpoint
├── search_config.yaml # Configuration backup
└── optuna_study.db # Bayesian optimization database
The markdown report includes:
The CSV export contains:
param_*
columns)arch_*
columns)Perfect for analysis in pandas, R, or Excel.
Begin with a small parameter space and short training times:
search_optimizations:
data:
sampling:
samples: 500 # Small subset first
variables: 100
model:
training:
epochs: 25 # Short training
early_stopping_patience: 3
For most cases, Bayesian optimization is the most efficient:
hyperparameter_tuning:
method: "bayesian"
n_trials: 20 # Often sufficient for good results
Hyperparameter search shows real-time progress:
🔄 Running hyperparameter trial 5/20
Variant: cnn_standard
Parameters: {'learning_rate': 0.001, 'batch_size': 32}
✅ Trial 5 completed in 45.2s
Metrics: {'accuracy': 0.847, 'f1_score': 0.834}
📊 Progress: 5/20 trials completed, 15 remaining
After finding optimal hyperparameters, run production training with full analysis:
Set hyperparameter_tuning.enabled: true
in your config file.
Add hyperparameters for your model type in the model_hyperparams
section.
Reduce the dataset size during search:
search_optimizations:
data:
sampling:
samples: 200 # Very small for memory-constrained systems
variables: 50
Install Optuna for Bayesian optimization:
pip install optuna>=3.0.0
hyperparameter_tuning:
enabled: true
method: "bayesian"
n_trials: 15
model_hyperparams:
CNN:
learning_rate: [0.001, 0.01]
batch_size: [16, 32, 64]
epochs: [50, 75]
hyperparameter_tuning:
enabled: true
method: "grid"
model_hyperparams:
CNN:
learning_rate: [0.001, 0.01]
batch_size: [32, 64]
cnn_variants:
- name: "small"
filters: [16]
kernel_sizes: [3]
- name: "medium"
filters: [32]
kernel_sizes: [3]
- name: "large"
filters: [64]
kernel_sizes: [5]
This creates 2 × 2 × 3 = 12 total combinations to explore.
Hyperparameter tuning integrates seamlessly with your existing workflow:
timeflies setup
commandtimeflies tune
for hyperparameter optimizationtimeflies train
The hyperparameter search uses all your existing settings (project, data paths, preprocessing) but optimizes the model parameters for best performance.