Introducing 2.5 Flash-Lite, a thinking model for those looking for low cost and latency.
Upgrade to Gemini 2.5
2.5 Flash-Lite excels at high-volume, latency-sensitive tasks like translation and classification.
-
Thinking, enabled
Experience improved reasoning and output quality with thinking mode and thinking budgets.
-
Superior latency
Benefit from faster response times.
-
Tool use
Utilize key Gemini 2.5 features including tool uses like Search and code execution.
-
Cost-efficient
2.5 Flash-Lite is our most cost-efficient 2.5 model yet.
Hands-on with 2.5 Flash-Lite
Benchmarks
2.5 Flash-Lite has all-round, significantly higher performance than 2.0 Flash-Lite on coding, math, science, reasoning and multimodal benchmarks.
| Benchmark | Gemini 2.0 Flash | Gemini 2.5 Flash-Lite Non-thinking | Gemini 2.5 Flash-Lite Thinking | |
|---|---|---|---|---|
| Reasoning & knowledge Humanity's Last Exam (no tools) | 5.1%* | 5.1% | 6.9% | |
| Science GPQA diamond | 65.2% | 64.6% | 66.7% | |
| Mathematics AIME 2025 | 29.7% | 49.8% | 63.1% | |
| Code generation LiveCodeBench (UI: 1/1/2025-5/1/2025) | 29.1% | 33.7% | 34.3% | |
| Code editing Aider Polyglot | 21.3% | 26.7% | 27.1% | |
| Agentic coding SWE-bench Verified | single attempt | 21.4% | 31.6% | 27.6% |
| | multiple attempts | 34.2% | 42.6% | 44.9% |
| Factuality SimpleQA | 29.9% | 10.7% | 13.0% | |
| Factuality FACTS grounding | 84.6% | 84.1% | 86.8% | |
| Visual reasoning MMMU | 69.3% | 72.9% | 72.9% | |
| Image understanding Vibe-Eval (Reka) | 55.4% | 51.3% | 57.5% | |
| Long context MRCR v2 (8-needle) | 128k (average) | 19.0% | 16.6% | 30.6% |
| | 1M (pointwise) | 5.3% | 4.1% | 5.4% |
| Multilingual performance Global MMLU (Lite) | 83.4% | 81.1% | 84.5% |
Model information
| 2.5 Flash-Lite | |
| Model deployment status | General availability |
| Supported data types for input | Text, Image, Video, Audio, PDF |
| Supported data types for output | Text |
| Supported # tokens for input | 1M |
| Supported # tokens for output | 64k |
| Knowledge cutoff | January 2025 |
| Tool use | Search as a tool Code execution |
| Best for | High volume, low-cost and low latency tasks |
| Availability | Google AI Studio Gemini API Vertex AI |