Gradient Cache - GPU Memory-Efficient Training

Gradient Cache is a production-ready PyTorch extension that reduces GPU memory usage by 90%+ during neural network training through intelligent gradient compression and CPU offloading.

🚀 Key Features

90%+ Memory Savings: Compress gradients by 100x with minimal accuracy impact
Larger Batch Sizes: Train with 2-3x larger batches on the same hardware
Simple Integration: Just 3 lines of code to add to any training loop
Universal Compatibility: Works with any PyTorch model and optimizer
Production Ready: Tested on A100 and T4 GPUs with real models

📊 Proven Results

Model	Parameters	Memory Saved	Compression
GPT-2 Small	124M	479 MB/step	100x
GPT-2 Medium	350M	~1.3 GB/step	100x
Custom NN	50M	144 MB/step	100x

🔧 Installation

pip install gradient-cache

Or install from source:

git clone https://github.com/JonSnow1807/gradient-cache cd gradient-cache pip install -e .

💡 Quick Start

Add gradient cache to any PyTorch training loop with just 3 lines:

import gradient_cache # Create your model model = create_your_model().cuda() # Add gradient cache (1 line) hook_manager = gradient_cache.create_gradient_cache(model, compression_ratio=100) # Normal training loop optimizer = torch.optim.Adam(model.parameters()) for batch in dataloader: loss = model(batch).mean() loss.backward() # Compress gradients (1 line) hook_manager.compress_and_free_gradients() # Restore gradients and update (1 line) hook_manager.apply_gradients() optimizer.step() optimizer.zero_grad()

🎯 Integration with Training Frameworks

Metaflow Integration

Use the decorator for automatic integration:

from metaflow import FlowSpec, step import gradient_cache class MyTrainingFlow(FlowSpec): @step @gradient_cache.optimize(compression_ratio=100) def train(self): # Your training code - no changes needed! model = create_model() optimizer = torch.optim.Adam(model.parameters()) # ... rest of training

PyTorch Lightning

import pytorch_lightning as pl import gradient_cache class MyModel(pl.LightningModule): def __init__(self): super().__init__() self.model = create_model() self.hook_manager = gradient_cache.create_gradient_cache(self.model) def training_step(self, batch, batch_idx): loss = self.model(batch).mean() return loss def on_after_backward(self): self.hook_manager.compress_and_free_gradients() def optimizer_step(self, *args, **kwargs): self.hook_manager.apply_gradients() super().optimizer_step(*args, **kwargs)

🛠️ Advanced Usage

Custom Compression Ratios

# Conservative - 10x compression (keep 10%) hook_manager = gradient_cache.create_gradient_cache(model, compression_ratio=10) # Aggressive - 1000x compression (keep 0.1%)  hook_manager = gradient_cache.create_gradient_cache(model, compression_ratio=1000)

Exclude Critical Layers

# Don't compress embeddings or output layers hook_manager = gradient_cache.GradientCacheHookManager( model, compression_ratio=100, exclude_layers=['embedding', 'lm_head'] )

Monitor Compression

# Enable verbose mode hook_manager = gradient_cache.create_gradient_cache(model, verbose=True) # Get compression statistics stats = hook_manager.get_compression_summary() print(f"Compression ratio: {stats['overall_compression_ratio']:.1f}x") print(f"Memory saved: {stats['memory_saved_mb']:.1f} MB")

📈 How It Works

Gradient Computation: Normal backward pass computes gradients
Compression: Keep only top 1% of gradient values by magnitude
CPU Offload: Move compressed gradients to system RAM
GPU Memory Release: Free GPU memory for next batch
Gradient Restoration: Restore gradients for optimizer step

🏆 Benefits

Cost Savings: Use smaller, cheaper GPU instances
Larger Models: Train models that don't fit in GPU memory
Faster Research: Iterate quickly with larger batch sizes
Easy Integration: No model architecture changes needed

🧪 Testing

Run the test suite:

python tests/test_gradient_cache.py

📝 Citation

If you use Gradient Cache in your research, please cite:

@software{gradient_cache, title = {Gradient Cache: GPU Memory-Efficient Training}, author = {Gradient Cache Contributors}, year = {2024}, url = {https://github.com/gradient-cache/gradient-cache} }

📄 License

Apache License 2.0 - see LICENSE for details.

🤝 Contributing

We welcome contributions! Please submit issues and pull requests on GitHub.

📧 Support

Issues: GitHub Issues
Discussions: GitHub Discussions

Built with ❤️ for the ML community

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
examples		examples
gradient_cache		gradient_cache
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
RELEASE_NOTES.md		RELEASE_NOTES.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Gradient Cache - GPU Memory-Efficient Training

🚀 Key Features

📊 Proven Results

🔧 Installation

💡 Quick Start

🎯 Integration with Training Frameworks

Metaflow Integration

PyTorch Lightning

🛠️ Advanced Usage

Custom Compression Ratios

Exclude Critical Layers

Monitor Compression

📈 How It Works

🏆 Benefits

🧪 Testing

📝 Citation

📄 License

🤝 Contributing

📧 Support

About

Uh oh!

Releases 1

Packages

Languages

Uh oh!

License

Uh oh!

JonSnow1807/gradient-cache

Folders and files

Latest commit

History

Repository files navigation

Gradient Cache - GPU Memory-Efficient Training

🚀 Key Features

📊 Proven Results

🔧 Installation

💡 Quick Start

🎯 Integration with Training Frameworks

Metaflow Integration

PyTorch Lightning

🛠️ Advanced Usage

Custom Compression Ratios

Exclude Critical Layers

Monitor Compression

📈 How It Works

🏆 Benefits

🧪 Testing

📝 Citation

📄 License

🤝 Contributing

📧 Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages