Gemini 3 Flash

Gemini 3 Flash combines Gemini 3 Pro's reasoning capabilities with the Flash line's levels on latency, efficiency, and cost. It not only enables everyday tasks with improved reasoning, but is designed to tackle the most complex agentic workflows.

Gemini 3 Flash uses several new features to improve performance, control, and multimodal fidelity:

Thinking level: Use the thinking_level parameter to control the amount of internal reasoning the model performs (minimal, low, medium, or high) to balance response quality, reasoning complexity, latency, and cost. The thinking_level parameter replaces thinking_budget for Gemini 3 models.

Note: If you used a thinking budget of 0 with Gemini 2.5 Flash, set your thinking level to MINIMAL for similar latency and cost; however, you still need to handle thought signatures when using the minimal thinking level.

For details on the different thinking levels, see Thinking.
Thought signatures: Stricter validation of thought signatures improves reliability in multi-turn function calling.
Media resolution: Use the media_resolution parameter (low, medium, high, or ultra high) to control vision processing for multimodal inputs, impacting token usage and latency. See Get started with Gemini 3 for default resolution settings.
- The ultra high media resolution level is only available for the IMAGE modality.
- PDF token counts will be listed under the IMAGE modality instead of the DOCUMENT modality in usage_metadata.
Multimodal function responses: Function responses can now include multimodal objects like images and PDFs in addition to text.
Streaming Function calling: Stream partial function call arguments to improve user experience during tool use.

For more information on using these features, see Get started with Gemini 3.

Try in Vertex AI View in Model Garden (Preview) Deploy example app

Note: To use the "Deploy example app" feature, you need a Google Cloud project with billing and Vertex AI API enabled.

Technical specifications
Model ID	`gemini-3-flash-preview`
Supported inputs & outputs	Inputs: Text, Code, Images, Audio, Video, PDF Outputs: Text
Token limits	Maximum input tokens: 1,048,576 Maximum output tokens: 65,536
Capabilities	Supported Grounding with Google Search Code execution System instructions Structured output Function calling Count Tokens Thinking Implicit context caching Explicit context caching Vertex AI RAG Engine Chat completions Not supported Tuning Gemini Live API
Usage types	Supported Provisioned Throughput Dynamic shared quota Batch prediction Not supported
	Images	Maximum images per prompt: 900 Maximum file size per file for inline data or direct uploads through the console: 7 MB Maximum file size per file from Google Cloud Storage: 30 MB Default resolution tokens: 1120 Supported MIME types: `image/png`, `image/jpeg`, `image/webp`, `image/heic`, `image/heif`
	Documents	Maximum number of files per prompt: 900 Maximum number of pages per file: 900 Maximum file size per file for the API or Cloud Storage imports: 50 MB Maximum file size per file for direct uploads through the console: 7 MB Default resolution tokens: 560 OCR for scanned PDFs: Not used by default Supported MIME types: `application/pdf`, `text/plain`
	Video	Maximum video length (with audio): Approximately 45 minutes Maximum video length (without audio): Approximately 1 hour Maximum number of videos per prompt: 10 Default resolution tokens per frame: 70 Supported MIME types: `video/x-flv`, `video/quicktime`, `video/mpeg`, `video/mpegs`, `video/mpg`, `video/mp4`, `video/webm`, `video/wmv`, `video/3gpp`
	Audio	Maximum audio length per prompt: Approximately 8.4 hours, or up to 1 million tokens Maximum number of audio files per prompt: 1 Speech understanding for: Audio summarization, transcription, and translation Supported MIME types: `audio/x-aac`, `audio/flac`, `audio/mp3`, `audio/m4a`, `audio/mpeg`, `audio/mpga`, `audio/mp4`, `audio/ogg`, `audio/pcm`, `audio/wav`, `audio/webm`
	Parameter defaults	Temperature: 0.0-2.0 (default 1.0) topP: 0.0-1.0 (default 0.95) topK: 64 (fixed) candidateCount: 1–8 (default 1)
Supported regions
	Model availability (Includes dynamic shared quota & Provisioned Throughput)	Global global
	See Deployments and endpoints for more information.
Knowledge cutoff date	January 2025
Versions	`gemini-3-flash-preview` Launch stage: Public preview Release date: December 17, 2025
Security controls
Security controls	See Security controls for more information.
Supported languages	See Supported languages.
Pricing	See Pricing.

Gemini 3 Flash Stay organized with collections Save and categorize content based on your preferences.

Gemini 3 Flash