Send audio and video streams

This document describes how to send audio and video streams to the Live API for real-time, bidirectional communication with Gemini models. Learn how to configure and transmit audio and video data to build dynamic and interactive applications.

Send audio streams

Implementing real-time audio requires strict adherence to sample rate specifications and careful buffer management to ensure low latency and natural interruptibility.

The Live API supports the following audio formats:

Input audio: Raw 16-bit PCM audio at 16 kHz, little-endian
Output audio: Raw 16-bit PCM audio at 24 kHz, little-endian

The following code sample shows you how to send streaming audio data:

import asyncio # Assumes session is an active Live API session # and chunk_data contains bytes of raw 16-bit PCM audio at 16 kHz. from google.genai import types # Send audio input data in chunks await session.send_realtime_input( audio=types.Blob(data=chunk_data, mime_type="audio/pcm;rate=16000") )

The client must maintain a playback buffer. The server streams audio in chunks within server_content messages. The client's responsibility is to decode, buffer, and play the data.

The following code sample shows you how to process streaming audio data:

import asyncio # Assumes session is an active Live API session # and audio_queue is an asyncio.Queue for buffering audio for playback. import numpy as np async for msg in session.receive(): server_content = msg.server_content if server_content: # 1. Handle Interruption if server_content.interrupted: print("\n[Interrupted] Flushing buffer...") # Clear the Python queue while not audio_queue.empty(): try: audio_queue.get_nowait() except asyncio.QueueEmpty: break # Send signal to worker to reset hardware buffers if needed await audio_queue.put(None) continue # 2. Process Audio chunks if server_content.model_turn: for part in server_content.model_turn.parts: if part.inline_data: # Add PCM data to playback queue await audio_queue.put(np.frombuffer(part.inline_data.data, dtype='int16'))

Send video streams

Video streaming provides visual context. The Live API expects a sequence of discrete image frames and supports video frames input at 1 FPS. For best results, use native 768x768 resolution at 1 FPS.

The following code sample shows you how to send streaming video data:

import asyncio # Assumes session is an active Live API session # and chunk_data contains bytes of a JPEG image. from google.genai import types # Send video input data in chunks await session.send_realtime_input( media=types.Blob(data=chunk_data, mime_type="image/jpeg") )

The client implementation captures a frame from the video feed, encodes it as a JPEG blob, and transmits it using the realtime_input message structure.

import cv2 import asyncio from google.genai import types async def send_video_stream(session): # Open webcam cap = cv2.VideoCapture(0) while True: ret, frame = cap.read() if not ret: break # 1. Resize to optimal resolution (768x768 max) frame = cv2.resize(frame, (768, 768)) # 2. Encode as JPEG _, buffer = cv2.imencode('.jpg', frame,) # 3. Send as realtime input await session.send_realtime_input( media=types.Blob(data=buffer.tobytes(), mime_type="image/jpeg") ) # 4. Wait 1 second (1 FPS) await asyncio.sleep(1.0) cap.release()

Configure media resolution

You can specify the resolution for input media by setting the media_resolution field in the session configuration. Lower resolution reduces token usage and latency, while higher resolution improves detail recognition. Supported values include low, medium, and high.

config = { "response_modalities": ["audio"], "media_resolution": "low", }

Send audio and video streams Stay organized with collections Save and categorize content based on your preferences.

Send audio streams

Send video streams

Configure media resolution

What's next

Send audio and video streams