Skip to content

SCAI-Lab/KarmaTS

Repository files navigation

Karma-TS

This is the repository for KarmaTS paper, which has been published in ML4H 2025. Full text of the paper can be found here.

Overview

Graphical-TS is a Python package for simulating multivariate time series data with an underlying graphical causal structure. Developed by the SCAI Lab, it provides a flexible framework for generating synthetic time series data with full control over causal relationships, edge functions, and temporal dependencies.

Installation

pip install graphical-ts

Requirements:

  • Python >= 3.6
  • NumPy >= 1.21
  • SciPy >= 1.7
  • NetworkX >= 2.6
  • Matplotlib >= 3.4
  • Pandas >= 1.3

Core Functionalities

1. Dynamic Graph Causal Models (DGCM)

  • Build causal graphs with time-lagged edges
  • Support for multiple edge types and lag structures
  • NetworkX integration for graph manipulation

2. Customizable Edge Functions

  • Define how parent nodes affect child nodes
  • Pre-defined function gallery (linear, nonlinear, MLP-based)
  • Support for custom PyTorch neural networks
  • Multiple architectures: MLP, VAE, Transformer, Diffusion models

3. Signal Functions

  • Independent signal components for each node
  • Pre-defined signals (Bessel processes, discrete signals, etc.)
  • Custom signal generation support

4. Expert Knowledge Integration

  • ExpertSim: Integrate domain knowledge into graph structure
  • NeuralExpertSim: Learn edge functions from real data using neural networks
  • Support for fMRI/neuroimaging data fusion

5. Missing Data Simulation

  • Missing Not At Random (MNAR) data generation
  • Missing data mechanisms with graphical structure
  • MDGCM class for missing data scenarios

6. Data Fusion

  • Combine expert knowledge graphs with real-world time series
  • Learn functional relationships from observed data
  • Generate augmented synthetic data preserving statistical properties

7. Visualization Tools

  • Graph visualization with temporal structure
  • Time series plotting utilities
  • Interactive graph editing interface

Example Roadmap

The examples/ directory contains comprehensive tutorials demonstrating different use cases:

Getting Started Examples

πŸ““ example.ipynb - Basic Graph Construction

  • Purpose: Introduction to building dynamic graphs
  • Key Concepts:
    • Creating graph structures with lagged edges
    • Defining parent-child relationships
    • Basic simulation workflow
  • Use Case: Learn the fundamental graph structure format

πŸ““ expert_edge_demo.ipynb - Expert Knowledge Integration

  • Purpose: Demonstrate expert-driven graph construction
  • Key Concepts:
    • Using ExpertSim for domain knowledge integration
    • Assigning signal functions to nodes
    • Defining edge relationships with expert knowledge
  • Use Case: Build graphs when you have domain expertise

Advanced Simulation Examples

πŸ““ random_edge.ipynb - Random MLP Edge Functions

  • Purpose: Generate synthetic data with random neural network edge functions
  • Key Concepts:
    • Using RandomMLPExpertSim for automatic edge function generation
    • Creating complex nonlinear relationships
    • Large-scale graph simulation
  • Use Case: Generate diverse synthetic datasets for benchmarking

πŸ““ WhereAreU.ipynb / WhereAreU_refactored.ipynb - Interactive Graph Generation

  • Purpose: Interactive workflow for creating and validating graph structures
  • Key Concepts:
    • Interactive graph editing and visualization
    • Real-time simulation and validation
    • Graph variation generation (tree, star, cycle structures)
    • Parameter exploration (lag sizes, edge density)
  • Use Case: Explore different graph topologies and validate simulation quality

Real-World Data Integration

πŸ““ experiments/MIRACLE.ipynb - Missing Data Simulation

  • Purpose: Generate Missing Not At Random (MNAR) data with graphical structure
  • Key Concepts:
    • Using MExpertSim for missing data scenarios
    • Defining missingness mechanisms
    • Continuous and discrete variable handling
  • Use Case: Simulate realistic missing data patterns for method evaluation

πŸ““ experiments/synthetic_vs_real.ipynb - Real vs Synthetic Comparison

  • Purpose: Systematic comparison between real and synthetic time series
  • Key Concepts:
    • Evaluating synthetic data quality
    • Downstream task performance (classification)
    • Graph/connectivity similarity metrics
    • Statistical property matching
  • Use Case: Validate synthetic data quality for real-world applications

πŸ““ fusion/fMRI.ipynb - fMRI Data Fusion

  • Purpose: Integrate fMRI data with expert knowledge graphs
  • Key Concepts:
    • Loading and preprocessing fMRI data (nilearn integration)
    • Learning edge functions from real fMRI time series
    • Generating synthetic fMRI data preserving connectivity
    • Network metrics comparison
  • Use Case: Generate synthetic neuroimaging data for research

Specialized Use Cases

πŸ““ mnar.ipynb - Missing Not At Random Data

  • Purpose: Detailed tutorial on MNAR data generation
  • Key Concepts:
    • Missing data mechanisms in graphical models
    • Enabling missingness for specific nodes
    • Missing data pattern visualization
  • Use Case: Study missing data mechanisms and imputation methods

πŸ““ playground.ipynb - Experimental Sandbox

  • Purpose: Free-form experimentation with the package
  • Use Case: Custom experiments and method development

Data Generation Scripts

πŸ“œ data_generation.py - Batch Dataset Generation

  • Purpose: Command-line tool for generating multiple datasets
  • Features:
    • Configurable graph parameters (nodes, edges, parents)
    • Quality checking and validation
    • Batch processing with parameter sweeps
  • Usage:
    python data_generation.py --N_GEN 10 --TRY_MAX 5 --PATH ./output --gen_params_path config.yaml

Quick Start

Basic Example

from graph_ts import DGCM, DynGraph # Define a graph structure graph = { "A": {0: ["B", "C"]}, # A affects B and C at lag 0 "B": {1: ["C"], 6: ["B"]}, # B affects C at lag 1, self-loop at lag 6 "C": {2: ["A"]} # C affects A at lag 2 } # Create dynamic graph dyn_graph = DynGraph(out_graph=graph) # Simulate time series data = dyn_graph.simulate_process(T=1000)

Expert Knowledge Integration

from graph_ts import ExpertSim, SignalFunction # Create expert simulation expert = ExpertSim() # Add nodes expert.add_node('A', type='continuous') expert.add_node('B', type='continuous') # Assign signal function (independent component) expert.assign_node_with_fn('A', SignalFunction.bessel_process_signal()) # Add edge with custom function expert.add_edge('A', 'B', lag=1, input_len=10) # Simulate data = expert.simulate_process(T=1000)

Neural Network Edge Functions

from graph_ts.simulation.expert_augmentation import NeuralExpertSim, TSFDataset from graph_ts.mapping.torch_nets import SimpleFunctionalNet # Load real data dataset = TSFDataset(real_data, graph_structure) # Create neural expert simulator neural_expert = NeuralExpertSim( graph=graph_structure, edge_function_template=SimpleFunctionalNet, in_dim=(10, 5), # (sequence_length, num_parents) out_dim=1 ) # Train on real data neural_expert.fit(dataset) # Generate synthetic data synthetic_data = neural_expert.simulate(T=1000)

Architecture Overview

Core Classes

  • DynGraph: Base dynamic graph structure (NetworkX MultiDiGraph)
  • DGCM: Dynamic Graph Causal Model with simulation capabilities
  • ExpertSim: Expert knowledge integration framework
  • NeuralExpertSim: Neural network-based edge function learning
  • MDGCM: Missing data graphical causal model
  • EdgeFunction: Base class for edge functional mappings
  • SignalFunction: Base class for independent node signals

Neural Network Architectures

  • SimpleFunctionalNet: Standard MLP for edge functions
  • VAEFunctionalNet: Variational autoencoder for statistical matching
  • TransformerFunctionalNet: Transformer-based sequence modeling
  • DiffusionFunctionalNet: Diffusion model for edge functions

Key Features

Feature Status Description
NetworkX Integration βœ… Full compatibility with NetworkX graphs
Custom Edge Functions βœ… Define any functional relationship
Custom Signal Functions βœ… Independent time-dependent signals
Pre-defined Function Gallery βœ… Library of common edge/signal functions
Interactive Graph UI βœ… Real-time graph editing and visualization
Neural Network Learning βœ… Learn edge functions from real data
Missing Data Support βœ… MNAR data generation
fMRI/Neuroimaging βœ… Specialized tools for brain data
Visualization Tools βœ… Graph and time series plotting

Use Cases

  1. Synthetic Data Generation: Create realistic time series with known causal structure
  2. Method Evaluation: Benchmark causal discovery and time series analysis methods
  3. Data Augmentation: Generate additional training data preserving statistical properties
  4. Missing Data Research: Study missing data mechanisms and imputation methods
  5. Domain Knowledge Integration: Incorporate expert knowledge into data generation
  6. Neuroscience Research: Generate synthetic fMRI/neuroimaging data

Important Notes

⚠️ Warning: Edge functions and signal functions are stored using pickle. Use with caution as stated in Python's pickle documentation.

Documentation

  • Main README: See README.md for detailed usage instructions
  • API Documentation: Check docstrings in source code
  • Examples: Explore notebooks in examples/ directory

License

This project is licensed under the MIT License.

Contact

For inquiries, please contact:

Future Improvements

  • Enable dVariable nodes for storing variable changes and evaluating instant effects
  • Enhanced real-world data adaptation with automatic graph learning
  • Additional neural network architectures
  • Extended visualization capabilities

Version: 0.4
Last Updated: 2024

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages