The TimeFlies Model Queue system enables automated sequential training of multiple models with different configurations, hyperparameters, and preprocessing options. This is essential for comparative analysis and hyperparameter exploration in research.
timeflies queue configs/your_queue.yaml
outputs/model_queue_summaries/
for reports# Queue settings
queue_settings:
name: "experiment_name" # Experiment name for reports
sequential: true # Run models sequentially (true) or parallel (false)
save_checkpoints: true # Enable checkpoint/resume functionality
generate_summary: true # Generate comparison report at end
# Define models to train
model_queue:
- name: "model_1" # Unique model name
model_type: "CNN" # Model type: CNN, MLP, xgboost, random_forest, logistic
description: "Description" # Optional description
hyperparameters: # Model-specific hyperparameters
epochs: 100
batch_size: 32
config_overrides: # Optional: override global settings
data:
batch_correction:
enabled: true
# Global settings applied to all models (can be overridden)
global_settings:
project: "fruitfly_aging" # Project name
data: # Data configuration
tissue: "head"
target_variable: "age"
# ... other data settings
with_analysis: true # Run evaluation after training
interpret: true # Generate SHAP analysis
visualize: true # Create visualizations
CNN
: Convolutional Neural NetworkMLP
: Multi-Layer Perceptronxgboost
: XGBoost classifierrandom_forest
: Random Forest classifierlogistic
: Logistic Regressiondata:
# Core data settings
model: "CNN" # Default model type
tissue: "head" # Tissue type: "head", "body", "all"
species: "drosophila" # Species identifier
cell_type: "all" # Cell type filter
sex: "all" # Sex filter: "all", "male", "female"
target_variable: "age" # Target variable for prediction
# Batch correction
batch_correction:
enabled: false # Enable batch correction
# Data filtering
filtering:
include_mixed_sex: false # Include mixed sex samples
# Data sampling
sampling:
samples: null # Number of samples (null for all)
variables: null # Number of genes (null for all)
# Train/test splitting
split:
method: "column" # Split method: "column" or "random"
column: "genotype" # Column to split on (if method="column")
train: ["control"] # Values for training
test: ["ab42", "htau"] # Values for testing
# Control what stages run for each model
with_training: true # Train models (false = skip for pre-trained models)
with_evaluation: true # Evaluate models (false = train-only mode)
with_eda: false # Run EDA before training/evaluation
with_analysis: true # Run analysis scripts after training/evaluation
interpret: true # Enable SHAP interpretation
visualize: true # Enable visualizations
model:
training:
epochs: 100 # Number of training epochs
batch_size: 32 # Batch size
validation_split: 0.2 # Validation split ratio
early_stopping_patience: 8 # Early stopping patience
learning_rate: 0.001 # Learning rate
CNN/MLP Models:
hyperparameters:
epochs: 100
batch_size: 32
learning_rate: 0.001
# CNN specific:
filters: [32, 64] # Convolutional filters
# MLP specific:
hidden_units: [128, 64] # Hidden layer sizes
XGBoost Models:
hyperparameters:
n_estimators: 100 # Number of trees
max_depth: 6 # Maximum tree depth
learning_rate: 0.3 # Learning rate
subsample: 0.8 # Subsample ratio
colsample_bytree: 0.8 # Feature subsample ratio
Random Forest Models:
hyperparameters:
n_estimators: 100 # Number of trees
max_depth: null # Maximum depth (null for unlimited)
min_samples_split: 2 # Minimum samples to split
min_samples_leaf: 1 # Minimum samples in leaf
Logistic Regression Models:
hyperparameters:
max_iter: 1000 # Maximum iterations
C: 1.0 # Regularization strength
penalty: "l2" # Regularization type
You can override any global setting for specific models using config_overrides
:
model_queue:
- name: "specialized_model"
model_type: "CNN"
hyperparameters:
epochs: 50
config_overrides:
# Override project for this model only
project: "fruitfly_alzheimers"
# Override data settings
data:
sex: "male" # Only male samples
batch_correction:
enabled: true # Enable batch correction
split:
method: "random" # Use random split
sampling:
samples: 5000 # Limit to 5000 samples
# Override analysis settings
interpret: false # Skip SHAP for this model
with_analysis: false # Skip evaluation
queue_settings:
name: "basic_model_comparison"
sequential: true
save_checkpoints: true
generate_summary: true
model_queue:
- name: "cnn_baseline"
model_type: "CNN"
description: "Baseline CNN model"
hyperparameters:
epochs: 50
batch_size: 32
- name: "xgboost_baseline"
model_type: "xgboost"
description: "Baseline XGBoost model"
hyperparameters:
n_estimators: 100
max_depth: 6
- name: "random_forest_baseline"
model_type: "random_forest"
description: "Baseline Random Forest model"
hyperparameters:
n_estimators: 100
global_settings:
project: "fruitfly_aging"
data:
tissue: "head"
target_variable: "age"
with_analysis: true
interpret: true
queue_settings:
name: "execution_modes_demo"
sequential: true
save_checkpoints: true
generate_summary: true
model_queue:
# Full workflow: Train + Evaluate + Analysis
- name: "full_workflow"
model_type: "CNN"
description: "Complete training and evaluation"
hyperparameters:
epochs: 50
config_overrides:
with_training: true # Train the model
with_evaluation: true # Evaluate the model
with_analysis: true # Run analysis scripts
# Train only (no evaluation)
- name: "train_only"
model_type: "CNN"
description: "Training only, skip evaluation"
hyperparameters:
epochs: 50
config_overrides:
with_training: true # Train the model
with_evaluation: false # Skip evaluation
with_analysis: false # Skip analysis
# Evaluate pre-trained model only
- name: "evaluate_pretrained"
model_type: "CNN"
description: "Evaluate existing trained model"
hyperparameters:
epochs: 50 # Ignored
config_overrides:
with_training: false # Skip training (load existing)
with_evaluation: true # Evaluate the model
with_analysis: true # Run analysis scripts
global_settings:
project: "fruitfly_aging"
data:
tissue: "head"
target_variable: "age"
# Default execution mode
with_training: true
with_evaluation: true
with_analysis: true
queue_settings:
name: "preprocessing_comparison"
sequential: true
save_checkpoints: true
generate_summary: true
model_queue:
# Default preprocessing
- name: "cnn_default"
model_type: "CNN"
description: "CNN with default preprocessing"
hyperparameters:
epochs: 50
# Batch correction enabled
- name: "cnn_batch_corrected"
model_type: "CNN"
description: "CNN with batch correction"
hyperparameters:
epochs: 50
config_overrides:
data:
batch_correction:
enabled: true
# Male samples only
- name: "cnn_male_only"
model_type: "CNN"
description: "CNN trained on male samples"
hyperparameters:
epochs: 50
config_overrides:
data:
sex: "male"
# Different splitting strategy
- name: "cnn_random_split"
model_type: "CNN"
description: "CNN with random data splitting"
hyperparameters:
epochs: 50
config_overrides:
data:
split:
method: "random"
global_settings:
project: "fruitfly_aging"
data:
tissue: "head"
target_variable: "age"
batch_correction:
enabled: false
sex: "all"
split:
method: "column"
column: "genotype"
train: ["control"]
test: ["ab42", "htau"]
with_analysis: true
# Run the default model queue (uses configs/model_queue.yaml)
timeflies queue
# Run a custom queue configuration
timeflies queue configs/my_custom_queue.yaml
# Start fresh (ignore any existing checkpoint)
timeflies queue --no-resume
During execution, you’ll see:
============================================================
STARTING MODEL QUEUE: 5 models to train
Queue name: preprocessing_comparison
============================================================
[1/5] Training: cnn_default
Model Type: CNN
Description: CNN with default preprocessing
============================================================
[INFO] Starting training for cnn_default...
[INFO] Starting evaluation for cnn_default...
[OK] Model cnn_default completed in 45.3s
[PROGRESS] 1 completed, 4 remaining
----------------------------------------
Progress: 1 completed, 0 failed
Best model so far: cnn_default (accuracy: 0.847)
----------------------------------------
If training is interrupted, you can resume:
# This will automatically resume from the last completed model
timeflies queue
# To start completely fresh
timeflies queue --no-resume
# With custom configuration
timeflies queue configs/my_queue.yaml
Each model saves its results in the standard TimeFlies output structure:
outputs/
├── {project}/
│ ├── experiments/
│ │ ├── uncorrected/
│ │ │ ├── all_runs/
│ │ │ │ └── {tissue}_{model}_{target}/
│ │ │ │ └── {timestamp}/
│ │ │ │ ├── model.h5
│ │ │ │ ├── training/
│ │ │ │ └── evaluation/
│ │ │ └── latest -> all_runs/.../
│ │ └── batch_corrected/
Queue-specific summaries are saved separately:
outputs/
├── model_queue_summaries/
│ ├── {queue_name}_{timestamp}_summary.md
│ └── {queue_name}_{timestamp}_metrics.csv
Markdown Report (*_summary.md
):
CSV Report (*_metrics.csv
):
hp_*
columns)Configuration Errors:
# Verify configuration syntax
timeflies verify # Checks for queue configs in system check
Training Failures:
Memory Issues:
Disk Space:
timeflies queue configs/model_queue.yaml --verbose
You can integrate custom analysis scripts into the queue:
global_settings:
with_analysis: true
analysis_script: "templates/my_custom_analysis.py"
For large-scale computing, wrap the queue command:
#!/bin/bash
#SBATCH --time=24:00:00
#SBATCH --mem=32G
source .activate.sh
timeflies queue configs/large_scale_queue.yaml
queue_settings:
sequential: false # Enable parallel execution
max_parallel: 4 # Maximum parallel models
The model queue system provides a powerful framework for systematic model comparison and hyperparameter exploration, essential for rigorous machine learning research in computational biology.