This repository is the official implementation of Generative Context Distillation.
- Updated
May 10, 2025 - Python
This repository is the official implementation of Generative Context Distillation.
Compress LLM Prompts and save 80%+ on GPT-4 in Python
CUTIA: compress prompts while preserving quality
A fast, Unix-style CLI tool for semantic prompt compression. Cuts LLM prompt tokens by 10-20x with >90% fidelity, saving costs and latency.
Add a description, image, and links to the prompt-compression topic page so that developers can more easily learn about it.
To associate your repository with the prompt-compression topic, visit your repo's landing page and select "manage topics."