Speculative Decoding Speedup Experiment

This repository contains a full experimental evaluation of speculative decoding using a lightweight draft model (gpt2, 124M params) to accelerate generation from a larger target model (gpt2-xl, 1.5B params).
The goal of this project was to measure how performance (latency, tokens/sec, acceptance rate, and overall speedup) varies with different:

Temperatures
Gamma values (number of draft tokens proposed per step)

Through this experiment, I built a complete benchmarking setup, implemented a custom speculative decoding loop, instrumented detailed metrics, and generated a comparative visualization showing how gamma and temperature influence final speedup.

What This Experiment Achieved

Implemented speculative decoding from scratch using PyTorch + HuggingFace Transformers
Benchmarked generation time, tokens/sec, acceptance rate, and overall speedup
Explored 40+ combinations of (temperature, gamma)
Identified the globally optimal speedup across all settings
Generated a final visualization summarizing how gamma affects latency and speedup at each temperature
Key observations:
- Moderate gamma values produce the best speedup
- Very large gamma → low acceptance → slower performance
- Temperature directly influences acceptance and stability
- The draft model can meaningfully accelerate decoding when tuned properly

This repository provides a reproducible and visual understanding of speculative decoding efficiency — useful for research, inference optimization, and system-level ML work.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
plots		plots
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
speculative_playground.ipynb		speculative_playground.ipynb
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Speculative Decoding Speedup Experiment

What This Experiment Achieved

Final Plot

Sources

About

Uh oh!

Releases

Packages

Languages

aankitdas/speculative-decoding

Folders and files

Latest commit

History

Repository files navigation

Speculative Decoding Speedup Experiment

What This Experiment Achieved

Final Plot

Sources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages