Our powerful and most efficient workhorse model designed for speed and low-cost.
Speed and value at scale
Ideal for tasks like summarization, chat applications, data extraction, and captioning.
-
Thinking budget
Control how much 2.5 Flash reasons to balance latency and cost.
-
Natively multimodal
Understands input across text, audio, images and video.
-
Long context
Explore vast datasets with a 1-million token context window.
Preview
Native audio
Converse in more expressive ways with native audio outputs that capture the subtle nuances of how we speak. Seamlessly switch between 24 languages, all with the same voice.
-
Natural conversation
Remarkable quality, more appropriate expressivity, and prosody, delivered with low latency so you can converse fluidly.
-
Style control
Use natural language prompts to adapt the delivery within the conversation, steer it to adopt accents and produce a range of tones and expressions.
-
Tool integration
Gemini 2.5 can use tools and function calling during dialog allowing it to incorporate real-time information or use custom developer-built tools.
-
Conversation context awareness
Our system is trained to discern and disregard background speech, ambient conversations and other irrelevant audio.
Benchmarks
| Benchmark | Gemini 2.5 Flash Thinking | Gemini 2.0 Flash | OpenAI o4-mini | Claude Sonnet 3.7 64k Extended thinking | Grok 3 Beta Extended thinking | DeepSeek R1 | |
|---|---|---|---|---|---|---|---|
| Input price | $/1M tokens | $0.30 | $0.10 | $1.10 | $3.00 | $3.00 | $0.55 |
| Output price | $/1M tokens | $2.50 | $0.40 | $4.40 | $15.00 | $15.00 | $2.19 |
| Reasoning & knowledge Humanity's Last Exam (no tools) | 11.0% | 5.1% | 14.3% | 8.9% | — | 8.6%* | |
| Science GPQA diamond | single attempt (pass@1) | 82.8% | 60.1% | 81.4% | 78.2% | 80.2% | 71.5% |
| | multiple attempts | — | — | — | 84.8% | 84.6% | — |
| Mathematics AIME 2025 | single attempt (pass@1) | 72.0% | 27.5% | 92.7% | 49.5% | 77.3% | 70.0% |
| | multiple attempts | — | — | — | — | 93.3% | — |
| Code generation LiveCodeBench v5 | single attempt (pass@1) | 63.9% | 34.5% | — | — | 70.6% | 64.3% |
| | multiple attempts | — | — | — | — | 79.4% | — |
| Code editing Aider Polyglot | 61.9% / 56.7% whole / diff-fenced | 22.2% whole | 68.9% / 58.2% whole / diff | 64.9% diff | 53.3% diff | 56.9% diff | |
| Agentic coding SWE-bench Verified | 60.4% | — | 68.1% | 70.3% | — | 49.2% | |
| Factuality SimpleQA | 26.9% | 29.9% | — | — | 43.6% | 30.1% | |
| Factuality FACTS Grounding | 85.3% | 84.6% | 62.1% | 78.8% | 74.8% | 56.8% | |
| Visual reasoning MMMU | single attempt (pass@1) | 79.7% | 71.7% | 81.6% | 75.0% | 76.0% | no MM support |
| | multiple attempts | — | — | — | — | 78.0% | no MM support |
| Image understanding Vibe-Eval (Reka) | 65.4% | 56.4% | — | — | — | no MM support | |
| Long context MRCR v2 | 128k (average) | 74.0% | 36.0% | 49.0% | — | 54.0% | 45.0% |
| | 1M (pointwise) | 32.0% | 6.0% | — | — | — | — |
| Multilingual performance Global MMLU (Lite) | 88.4% | 83.4% | — | — | — | — |
Model information
| 2.5 Flash | |
| Model deployment status | General availability |
| Supported data types for input | Text, Image, Video, Audio, PDF |
| Supported data types for output | Text |
| Supported # tokens for input | 1M |
| Supported # tokens for output | 64k |
| Knowledge cutoff | January 2025 |
| Tool use | Function calling Structured output Search as a tool Code execution |
| Best for | Cost-efficient thinking Well-rounded capabilities |
| Availability | Gemini app Google AI Studio Gemini API Live API Vertex AI |