Machine Learning for Aging Analysis in Drosophila Single-Cell Data
TimeFlies is a comprehensive machine learning framework for analyzing aging patterns in Drosophila single-cell RNA sequencing data. It provides deep learning models, model interpretability analysis, batch correction capabilities, and a complete research workflow.
# Download and run the installer
curl -O https://raw.githubusercontent.com/rsinghlab/TimeFlies/main/install_timeflies.sh
chmod +x install_timeflies.sh
./install_timeflies.sh
The installer automatically activates TimeFlies. For new terminal windows:
source .activate.sh
Command Line Interface:
# Complete setup workflow (customize configs/setup.yaml first)
timeflies setup [--batch-correct]
# Web-based GUI (recommended for beginners)
timeflies gui
# Batch correction (automatic environment switching)
timeflies batch-correct
# Train models with automatic evaluation
timeflies train [--with-eda --with-analysis --batch-corrected]
# Evaluate trained models
timeflies evaluate [--with-eda --with-analysis]
# Automated multi-model training queue
timeflies queue [configs/model_queue.yaml] [--no-resume]
# Automated hyperparameter tuning (customize configs/hyperparameter_tuning.yaml first)
timeflies tune [--no-resume]
# Run project-specific analysis
timeflies analyze
Web-Based GUI (Recommended):
For users who prefer a graphical interface, run timeflies gui
to launch a modern web-based interface:
Usage:
timeflies gui # Launch on http://localhost:7860
timeflies gui --port 8080 # Use different port
timeflies gui --share # Create public URL (use with caution)
Both CLI and web GUI provide identical functionality - choose what works best for you.
TimeFlies uses modular configuration files in the configs/
directory:
*_original.h5ad
files in data/[project]/[tissue]/
configs/setup.yaml
for data splitting (split_size, stratify_by, etc.)timeflies setup
to create train/eval splits and verify systemtimeflies train
for model training with automatic evaluationtimeflies evaluate
to assess model performance on test dataoutputs/[project]/
with model interpretabilityIf you need to change splitting parameters (e.g., different stratification or split size):
# Edit configs/setup.yaml with new parameters
timeflies setup --force-split # Recreates splits, preserves batch-corrected files
Smart behavior:
--force-split
removes existing train/eval splits and recreates themTimeFlies generates comprehensive outputs organized by project and analysis type:
outputs/
├── fruitfly_aging/ # Project-specific results
│ ├── experiments/ # Model training results
│ │ ├── uncorrected/ # Non-batch-corrected results
│ │ │ └── all_runs/
│ │ │ └── head_cnn_age/ # Config-specific experiments (tissue_model_target)
│ │ │ ├── 2024-08-25_10-30-15/ # Individual experiment
│ │ │ │ ├── model.h5 # Trained TensorFlow model
│ │ │ │ ├── training/ # Training artifacts
│ │ │ │ │ ├── history.json # Training metrics & loss curves
│ │ │ │ │ ├── logs/ # Training logs
│ │ │ │ │ └── plots/ # Training visualizations
│ │ │ │ ├── evaluation/ # Test results
│ │ │ │ │ ├── metrics.json # Performance metrics (accuracy, F1, precision, recall, AUC, baselines)
│ │ │ │ │ ├── predictions.csv # Model predictions
│ │ │ │ │ └── plots/ # Performance visualizations
│ │ │ │ │ ├── confusion_matrix.png
│ │ │ │ │ ├── roc_curve.png
│ │ │ │ │ └── classification_report.png
│ │ │ │ ├── shap_analysis/ # SHAP interpretability
│ │ │ │ │ ├── shap_values.csv
│ │ │ │ │ ├── shap_summary.png
│ │ │ │ │ └── feature_importance.png
│ │ │ │ └── metadata.json # Experiment reproducibility info
│ │ │ ├── latest -> 2024-08-25_10-30-15/ # Symlink to most recent
│ │ │ └── best -> 2024-08-25_10-30-15/ # Symlink to best performance
│ │ ├── batch_corrected/ # Batch-corrected results (same structure)
│ │ └── queue_experiment_2024-08-25/ # Model queue results
│ │ ├── model_comparison_report.md # Queue summary report
│ │ ├── model_metrics.csv # All models comparison
│ │ └── individual_model_results/ # Links to experiment dirs
│ ├── hyperparameter_tuning/ # Hyperparameter optimization
│ │ └── hyperparameter_search_2024-08-25_16-30-45/
│ │ ├── hyperparameter_search_report.md # Best trials & selection reasoning
│ │ ├── hyperparameter_search_metrics.csv # All trials data for analysis
│ │ ├── checkpoint.json # Resume capability for interrupted searches
│ │ ├── search_config.yaml # Configuration backup for reproducibility
│ │ └── optuna_study.db # Bayesian optimization database (if using Optuna)
│ └── eda/ # Exploratory data analysis
│ └── head/ # Tissue-specific analysis
│ ├── uncorrected/ # Raw data EDA
│ │ ├── eda_report.html # Interactive analysis report
│ │ ├── plots/ # EDA visualizations
│ │ │ ├── age_distribution.png
│ │ │ ├── correlation_matrix.png
│ │ │ └── dimensionality_reduction.png
│ │ └── eda_summary.json # Statistical summaries
│ └── batch_corrected/ # Batch-corrected EDA (same structure)
└── fruitfly_alzheimers/ # Separate project outputs
└── [same structure as above]
templates/
directoryconfigs/
directorytests/fixtures/
All 12 CLI commands with their full options:
timeflies setup [--batch-correct] [--force-split] [--dev] # Complete setup workflow
timeflies train [--with-eda] [--with-analysis] # Train models (includes automatic evaluation)
timeflies evaluate [--with-eda] [--with-analysis] [--interpret] [--visualize] # Evaluate models on test data
timeflies analyze [--predictions-path PATH] [--analysis-script PATH] [--with-eda] # Project-specific analysis scripts
timeflies queue [configs/model_queue.yaml] [--no-resume] # Automated multi-model training queue (see docs/model_queue_guide.md)
timeflies tune [--no-resume] # Hyperparameter optimization using configs/hyperparameter_tuning.yaml (see docs/hyperparameter_tuning_guide.md)
timeflies split [--force-split] # Create train/eval splits
timeflies eda [--save-report] # Exploratory data analysis
timeflies batch-correct # Create batch-corrected files (requires .venv_batch)
timeflies verify # System verification
timeflies test [unit|integration|functional|system|all] [--coverage] [--verbose] [--fast] [--debug] [--rerun]
timeflies create-test-data [--tier tiny|synthetic|real|all] [--cells N] [--genes N] [--batch-versions]
Keep TimeFlies Updated: Use timeflies update
to get the latest features and bug fixes:
# Update to latest version from GitHub main branch
timeflies update
What happens during update:
Files that get UPDATED:
.timeflies_src/
- source code and templates (completely refreshed)TimeFlies_Launcher.py
- GUI launcher (only if content changed)README.md
, analysis examples (updated for new features)setup.yaml
, hyperparameter_tuning.yaml
Files that are PRESERVED (never touched):
data/
- your datasets and H5AD filesoutputs/
- all experiments, analysis results, and trained modelsconfigs/
- your customized configuration settingsCustom templates - any analysis scripts you created
GUI Users: Use the “Update TimeFlies” button in the Results tab for the same functionality.
--verbose # Detailed logging
--batch-corrected # Use existing batch-corrected data (any command)
--tissue head|body # Override tissue type
--model CNN|MLP|xgboost|random_forest|logistic # Override model type
--target age # Override target variable
--aging # Use fruitfly_aging project
--alzheimers # Use fruitfly_alzheimers project
TimeFlies uses YAML configuration files to control model training, evaluation, and analysis settings. The main configuration is in configs/default.yaml
.
Control SHAP interpretation and visualizations:
# Feature importance analysis
interpretation:
shap:
enabled: false # Enable/disable SHAP interpretation (includes visualizations)
load_existing: false # Load existing SHAP values instead of computing
reference_size: 100 # Reference size for SHAP analysis
# Visualizations
visualizations:
enabled: true # Enable general visualizations (training plots, confusion matrix, ROC curves, etc.)
Configure project-specific analysis workflows:
analysis:
# Exploratory data analysis
eda:
enabled: false
# Run project-specific analysis scripts
run_analysis_script:
enabled: false # Set to true to run project-specific analysis after training
Override configuration settings using command-line flags:
# Force SHAP interpretation (overrides config)
timeflies evaluate --interpret
# Force visualizations (overrides config)
timeflies evaluate --visualize
# Use custom analysis script
timeflies analyze --analysis-script templates/my_custom_analysis.py
# Combine flags
timeflies evaluate --interpret --visualize --with-analysis
Create custom analysis workflows using templates:
# Copy template and customize
cp templates/aging_analysis_template.py templates/my_analysis.py
# Run your custom analysis
timeflies analyze --analysis-script templates/my_analysis.py
Available templates:
templates/custom_analysis_example.py
- Basic template with all featurestemplates/aging_analysis_template.py
- Aging-specific analysis patterns# Clone repository
git clone https://github.com/rsinghlab/TimeFlies.git
cd TimeFlies
# Setup development environments (creates .venv + .venv_batch with all dependencies)
python3 run_timeflies.py setup --dev
# Activate development environment
source .activate.sh
# Now you can use timeflies command directly
timeflies verify
timeflies test --coverage
timeflies create-test-data --tier tiny # (optional - already included)
# For batch correction development (specialized)
source .activate_batch.sh # PyTorch + scVI environment for testing batch correction code
timeflies create-test-data --tier tiny --batch-versions # (optional - already committed)
timeflies create-test-data --tier synthetic --batch-versions # Generate on-demand for testing
TimeFlies/
├── configs/ # YAML configuration files
├── src/ # Source code
│ └── common/ # Framework components
│ ├── analysis/ # EDA and visualization tools
│ ├── cli/ # Command-line interface
│ ├── core/ # Pipeline and configuration management
│ ├── data/ # Data loading and preprocessing
│ ├── evaluation/ # Model evaluation and metrics
│ ├── models/ # ML model implementations
│ └── utils/ # Utilities and helpers
├── tests/ # Test suite with 3-tier test data
│ ├── fixtures/ # Test data (tiny/synthetic/real)
│ └── outputs/ # Test outputs (temporary)
├── templates/ # Analysis script templates
├── docs/ # Documentation and notebooks
├── install_timeflies.sh # One-click installer
├── run_timeflies.py # Main CLI entry point
└── TimeFlies_Launcher.py # GUI Launcher
After installation, users work with this structure in their project directory:
your_project/
├── configs/ # Configuration directory created by TimeFlies setup
│ ├── default.yaml # Main configuration (customize your settings)
│ ├── setup.yaml # Data splitting configuration
│ └── ... # Other config files
├── templates/ # Analysis script templates (created by setup)
│ ├── aging_analysis_template.py
│ ├── custom_analysis_example.py
│ └── README.md
├── data/ # Your input datasets
│ ├── fruitfly_aging/
│ │ └── head/
│ │ ├── *_original.h5ad # Your raw data files
│ │ ├── *_train.h5ad # Generated by 'split' command
│ │ └── *_eval.h5ad # Generated by 'split' command
│ └── fruitfly_alzheimers/
│ └── head/
│ └── *_original.h5ad # Your raw data files
└── outputs/ # All results generated by TimeFlies
└── [see Output Structure below]
curl -O https://raw.githubusercontent.com/.../install_timeflies.sh && chmod +x install_timeflies.sh && ./install_timeflies.sh
source .activate.sh
(installs timeflies command to system)*_original.h5ad
files in data/[project]/[tissue]/
timeflies setup
(creates configs/, templates/, splits data, verifies system)python TimeFlies_Launcher.py
)timeflies train && timeflies evaluate
TimeFlies is designed for researchers studying:
git checkout -b feature-name
timeflies test --coverage
This project is licensed under the TimeFlies Academic Research License with pre-publication restrictions - see the LICENSE file for details.
Pre-Publication Period: All rights reserved. Commercial use, redistribution, and derivative works require explicit written permission from the Singh Lab, Brown University.
Post-Publication: License will transition to a more permissive open-source license after publication of associated research.
Developed by the Singh Lab for advancing aging research through machine learning.
Contact: Singh Lab Repository: TimeFlies
TimeFlies v1.0 - Advancing aging research through machine learning