TimeFlies

TimeFlies v1.0

Machine Learning for Aging Analysis in Drosophila Single-Cell Data

TimeFlies is a comprehensive machine learning framework for analyzing aging patterns in Drosophila single-cell RNA sequencing data. It provides deep learning models, model interpretability analysis, batch correction capabilities, and a complete research workflow.

Quick Start

Installation

# Download and run the installer
curl -O https://raw.githubusercontent.com/rsinghlab/TimeFlies/main/install_timeflies.sh
chmod +x install_timeflies.sh
./install_timeflies.sh

Automatic Environment Activation

The installer automatically activates TimeFlies. For new terminal windows:

source .activate.sh

Basic Usage

Command Line Interface:

# Complete setup workflow (customize configs/setup.yaml first)
timeflies setup [--batch-correct]

# Web-based GUI (recommended for beginners)
timeflies gui

# Batch correction (automatic environment switching)
timeflies batch-correct

# Train models with automatic evaluation
timeflies train [--with-eda --with-analysis --batch-corrected]

# Evaluate trained models
timeflies evaluate [--with-eda --with-analysis]

# Automated multi-model training queue
timeflies queue [configs/model_queue.yaml] [--no-resume]

# Automated hyperparameter tuning (customize configs/hyperparameter_tuning.yaml first)
timeflies tune [--no-resume]

# Run project-specific analysis
timeflies analyze

Web-Based GUI (Recommended): For users who prefer a graphical interface, run timeflies gui to launch a modern web-based interface:

Usage:

timeflies gui                    # Launch on http://localhost:7860
timeflies gui --port 8080        # Use different port
timeflies gui --share            # Create public URL (use with caution)

Both CLI and web GUI provide identical functionality - choose what works best for you.

Configuration Files

TimeFlies uses modular configuration files in the configs/ directory:

Research Workflow

  1. Data Setup: Place your *_original.h5ad files in data/[project]/[tissue]/
  2. Configuration: Edit configs/setup.yaml for data splitting (split_size, stratify_by, etc.)
  3. Setup: Run timeflies setup to create train/eval splits and verify system
  4. Training: Run timeflies train for model training with automatic evaluation
  5. Evaluation: Run timeflies evaluate to assess model performance on test data
  6. Analysis: Results available in outputs/[project]/ with model interpretability

Re-splitting Data

If you need to change splitting parameters (e.g., different stratification or split size):

# Edit configs/setup.yaml with new parameters
timeflies setup --force-split    # Recreates splits, preserves batch-corrected files

Smart behavior:

Output Structure

TimeFlies generates comprehensive outputs organized by project and analysis type:

outputs/
├── fruitfly_aging/                          # Project-specific results
│   ├── experiments/                         # Model training results
│   │   ├── uncorrected/                     # Non-batch-corrected results
│   │   │   └── all_runs/
│   │   │       └── head_cnn_age/            # Config-specific experiments (tissue_model_target)
│   │   │           ├── 2024-08-25_10-30-15/ # Individual experiment
│   │   │           │   ├── model.h5         # Trained TensorFlow model
│   │   │           │   ├── training/        # Training artifacts
│   │   │           │   │   ├── history.json # Training metrics & loss curves
│   │   │           │   │   ├── logs/        # Training logs
│   │   │           │   │   └── plots/       # Training visualizations
│   │   │           │   ├── evaluation/      # Test results
│   │   │           │   │   ├── metrics.json # Performance metrics (accuracy, F1, precision, recall, AUC, baselines)
│   │   │           │   │   ├── predictions.csv # Model predictions
│   │   │           │   │   └── plots/       # Performance visualizations
│   │   │           │   │       ├── confusion_matrix.png
│   │   │           │   │       ├── roc_curve.png
│   │   │           │   │       └── classification_report.png
│   │   │           │   ├── shap_analysis/   # SHAP interpretability
│   │   │           │   │   ├── shap_values.csv
│   │   │           │   │   ├── shap_summary.png
│   │   │           │   │   └── feature_importance.png
│   │   │           │   └── metadata.json    # Experiment reproducibility info
│   │   │           ├── latest -> 2024-08-25_10-30-15/  # Symlink to most recent
│   │   │           └── best -> 2024-08-25_10-30-15/    # Symlink to best performance
│   │   ├── batch_corrected/                 # Batch-corrected results (same structure)
│   │   └── queue_experiment_2024-08-25/     # Model queue results
│   │       ├── model_comparison_report.md   # Queue summary report
│   │       ├── model_metrics.csv            # All models comparison
│   │       └── individual_model_results/    # Links to experiment dirs
│   ├── hyperparameter_tuning/               # Hyperparameter optimization
│   │   └── hyperparameter_search_2024-08-25_16-30-45/
│   │       ├── hyperparameter_search_report.md  # Best trials & selection reasoning
│   │       ├── hyperparameter_search_metrics.csv # All trials data for analysis
│   │       ├── checkpoint.json              # Resume capability for interrupted searches
│   │       ├── search_config.yaml           # Configuration backup for reproducibility
│   │       └── optuna_study.db              # Bayesian optimization database (if using Optuna)
│   └── eda/                                 # Exploratory data analysis
│       └── head/                           # Tissue-specific analysis
│           ├── uncorrected/                # Raw data EDA
│           │   ├── eda_report.html         # Interactive analysis report
│           │   ├── plots/                  # EDA visualizations
│           │   │   ├── age_distribution.png
│           │   │   ├── correlation_matrix.png
│           │   │   └── dimensionality_reduction.png
│           │   └── eda_summary.json        # Statistical summaries
│           └── batch_corrected/            # Batch-corrected EDA (same structure)
└── fruitfly_alzheimers/                     # Separate project outputs
    └── [same structure as above]

Key Output Files

Supported Projects

Key Features

Machine Learning Pipeline

Data Processing

Research Tools

Automated Model Queue System

Documentation

Comprehensive Guides

Commands Reference

All 12 CLI commands with their full options:

Core Research Commands

timeflies setup [--batch-correct] [--force-split] [--dev]     # Complete setup workflow
timeflies train [--with-eda] [--with-analysis] # Train models (includes automatic evaluation)
timeflies evaluate [--with-eda] [--with-analysis] [--interpret] [--visualize] # Evaluate models on test data
timeflies analyze [--predictions-path PATH] [--analysis-script PATH] [--with-eda] # Project-specific analysis scripts
timeflies queue [configs/model_queue.yaml] [--no-resume] # Automated multi-model training queue (see docs/model_queue_guide.md)
timeflies tune [--no-resume] # Hyperparameter optimization using configs/hyperparameter_tuning.yaml (see docs/hyperparameter_tuning_guide.md)

Data & Analysis Commands

timeflies split [--force-split]              # Create train/eval splits
timeflies eda [--save-report]                 # Exploratory data analysis
timeflies batch-correct                       # Create batch-corrected files (requires .venv_batch)
timeflies verify                              # System verification

Development Commands

timeflies test [unit|integration|functional|system|all] [--coverage] [--verbose] [--fast] [--debug] [--rerun]
timeflies create-test-data [--tier tiny|synthetic|real|all] [--cells N] [--genes N] [--batch-versions]

System Updates

Keep TimeFlies Updated: Use timeflies update to get the latest features and bug fixes:

# Update to latest version from GitHub main branch
timeflies update

What happens during update:

Files that get UPDATED:

Files that are PRESERVED (never touched):

GUI Users: Use the “Update TimeFlies” button in the Results tab for the same functionality.

Global Options (work with any command)

--verbose                 # Detailed logging
--batch-corrected         # Use existing batch-corrected data (any command)
--tissue head|body        # Override tissue type
--model CNN|MLP|xgboost|random_forest|logistic   # Override model type
--target age              # Override target variable
--aging                   # Use fruitfly_aging project
--alzheimers              # Use fruitfly_alzheimers project

Configuration

TimeFlies uses YAML configuration files to control model training, evaluation, and analysis settings. The main configuration is in configs/default.yaml.

Key Configuration Sections

Model Interpretability (SHAP Analysis)

Control SHAP interpretation and visualizations:

# Feature importance analysis
interpretation:
  shap:
    enabled: false           # Enable/disable SHAP interpretation (includes visualizations)
    load_existing: false     # Load existing SHAP values instead of computing
    reference_size: 100      # Reference size for SHAP analysis

# Visualizations
visualizations:
  enabled: true             # Enable general visualizations (training plots, confusion matrix, ROC curves, etc.)

Analysis Scripts

Configure project-specific analysis workflows:

analysis:
  # Exploratory data analysis
  eda:
    enabled: false

  # Run project-specific analysis scripts
  run_analysis_script:
    enabled: false  # Set to true to run project-specific analysis after training

CLI Overrides

Override configuration settings using command-line flags:

# Force SHAP interpretation (overrides config)
timeflies evaluate --interpret

# Force visualizations (overrides config)
timeflies evaluate --visualize

# Use custom analysis script
timeflies analyze --analysis-script templates/my_custom_analysis.py

# Combine flags
timeflies evaluate --interpret --visualize --with-analysis

Custom Analysis Scripts

Create custom analysis workflows using templates:

# Copy template and customize
cp templates/aging_analysis_template.py templates/my_analysis.py

# Run your custom analysis
timeflies analyze --analysis-script templates/my_analysis.py

Available templates:

Development

For Developers

# Clone repository
git clone https://github.com/rsinghlab/TimeFlies.git
cd TimeFlies

# Setup development environments (creates .venv + .venv_batch with all dependencies)
python3 run_timeflies.py setup --dev

# Activate development environment
source .activate.sh

# Now you can use timeflies command directly
timeflies verify
timeflies test --coverage
timeflies create-test-data --tier tiny  # (optional - already included)

# For batch correction development (specialized)
source .activate_batch.sh  # PyTorch + scVI environment for testing batch correction code

Test Data System

timeflies create-test-data --tier tiny --batch-versions     # (optional - already committed)
timeflies create-test-data --tier synthetic --batch-versions # Generate on-demand for testing

Repository Structure

TimeFlies/
├── configs/              # YAML configuration files
├── src/                  # Source code
│   └── common/          # Framework components
│       ├── analysis/    # EDA and visualization tools
│       ├── cli/         # Command-line interface
│       ├── core/        # Pipeline and configuration management
│       ├── data/        # Data loading and preprocessing
│       ├── evaluation/  # Model evaluation and metrics
│       ├── models/      # ML model implementations
│       └── utils/       # Utilities and helpers
├── tests/               # Test suite with 3-tier test data
│   ├── fixtures/        # Test data (tiny/synthetic/real)
│   └── outputs/         # Test outputs (temporary)
├── templates/           # Analysis script templates
├── docs/               # Documentation and notebooks
├── install_timeflies.sh # One-click installer
├── run_timeflies.py    # Main CLI entry point
└── TimeFlies_Launcher.py    # GUI Launcher

User Setup Guide

After installation, users work with this structure in their project directory:

your_project/
├── configs/             # Configuration directory created by TimeFlies setup
│   ├── default.yaml     # Main configuration (customize your settings)
│   ├── setup.yaml       # Data splitting configuration
│   └── ...              # Other config files
├── templates/           # Analysis script templates (created by setup)
│   ├── aging_analysis_template.py
│   ├── custom_analysis_example.py
│   └── README.md
├── data/                # Your input datasets
│   ├── fruitfly_aging/
│   │   └── head/
│   │       ├── *_original.h5ad     # Your raw data files
│   │       ├── *_train.h5ad        # Generated by 'split' command
│   │       └── *_eval.h5ad         # Generated by 'split' command
│   └── fruitfly_alzheimers/
│       └── head/
│           └── *_original.h5ad     # Your raw data files
└── outputs/             # All results generated by TimeFlies
    └── [see Output Structure below]

Getting Started

  1. Install TimeFlies: curl -O https://raw.githubusercontent.com/.../install_timeflies.sh && chmod +x install_timeflies.sh && ./install_timeflies.sh
  2. Activate: source .activate.sh (installs timeflies command to system)
  3. Add your data: Place *_original.h5ad files in data/[project]/[tissue]/
  4. Setup: timeflies setup (creates configs/, templates/, splits data, verifies system)
  5. Configure: Edit configs for your project settings (or use GUI: python TimeFlies_Launcher.py)
  6. Run workflow: timeflies train && timeflies evaluate

System Requirements

Research Applications

TimeFlies is designed for researchers studying:

Contributing

  1. Fork the repository
  2. Create feature branch: git checkout -b feature-name
  3. Run tests: timeflies test --coverage
  4. Submit pull request

License

This project is licensed under the TimeFlies Academic Research License with pre-publication restrictions - see the LICENSE file for details.

Pre-Publication Period: All rights reserved. Commercial use, redistribution, and derivative works require explicit written permission from the Singh Lab, Brown University.

Post-Publication: License will transition to a more permissive open-source license after publication of associated research.

Singh Lab

Developed by the Singh Lab for advancing aging research through machine learning.

Contact: Singh Lab Repository: TimeFlies


TimeFlies v1.0 - Advancing aging research through machine learning