⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
- Updated
Oct 8, 2024 - Python
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
scalable and robust tree-based speculative decoding algorithm
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.
LLM Inference on consumer devices
[ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation
[NeurIPS'23] Speculative Decoding with Big Little Decoder
Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.
[ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
[ICML 2025] Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.
Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)
Official implementation of CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding.
Pretty and simple to use implementation of speculative decoding algorithm eagle which is extrapolation algorithm for greater language model efficiency 🦅
Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton
Official Implementation of LANTERN (ICLR'25) and LANTERN++(ICLRW-SCOPE'25)
Official Implementation of "GRIFFIN: Effective Token Alignment for Faster Speculative Decoding"[NeurIPS 2025]
Accelerating LLM inference with techniques like speculative decoding, quantization, and kernel fusion, focusing on implementing state-of-the-art research papers.
Unofficial implementation of Token Recycling self-speculative decoding method.
Add a description, image, and links to the speculative-decoding topic page so that developers can more easily learn about it.
To associate your repository with the speculative-decoding topic, visit your repo's landing page and select "manage topics."