Send audio and video streams

This document describes how to send audio and video streams to the Live API for real-time, bidirectional communication with Gemini models. Learn how to configure and transmit audio and video data to build dynamic and interactive applications.

Send audio streams

Implementing real-time audio requires strict adherence to sample rate specifications and careful buffer management to ensure low latency and natural interruptibility.

The Live API supports the following audio formats:

  • Input audio: Raw 16-bit PCM audio at 16 kHz, little-endian
  • Output audio: Raw 16-bit PCM audio at 24 kHz, little-endian

The following code sample shows you how to send streaming audio data:

import asyncio # Assumes session is an active Live API session # and chunk_data contains bytes of raw 16-bit PCM audio at 16 kHz. from google.genai import types # Send audio input data in chunks await session.send_realtime_input( audio=types.Blob(data=chunk_data, mime_type="audio/pcm;rate=16000") ) 

The client must maintain a playback buffer. The server streams audio in chunks within server_content messages. The client's responsibility is to decode, buffer, and play the data.

The following code sample shows you how to process streaming audio data:

import asyncio # Assumes session is an active Live API session # and audio_queue is an asyncio.Queue for buffering audio for playback. import numpy as np async for msg in session.receive(): server_content = msg.server_content if server_content: # 1. Handle Interruption if server_content.interrupted: print("\n[Interrupted] Flushing buffer...") # Clear the Python queue while not audio_queue.empty(): try: audio_queue.get_nowait() except asyncio.QueueEmpty: break # Send signal to worker to reset hardware buffers if needed await audio_queue.put(None) continue # 2. Process Audio chunks if server_content.model_turn: for part in server_content.model_turn.parts: if part.inline_data: # Add PCM data to playback queue await audio_queue.put(np.frombuffer(part.inline_data.data, dtype='int16')) 

Send video streams

Video streaming provides visual context. The Live API expects a sequence of discrete image frames and supports video frames input at 1 FPS. For best results, use native 768x768 resolution at 1 FPS.

The following code sample shows you how to send streaming video data:

import asyncio # Assumes session is an active Live API session # and chunk_data contains bytes of a JPEG image. from google.genai import types # Send video input data in chunks await session.send_realtime_input( media=types.Blob(data=chunk_data, mime_type="image/jpeg") ) 

The client implementation captures a frame from the video feed, encodes it as a JPEG blob, and transmits it using the realtime_input message structure.

import cv2 import asyncio from google.genai import types async def send_video_stream(session): # Open webcam cap = cv2.VideoCapture(0) while True: ret, frame = cap.read() if not ret: break # 1. Resize to optimal resolution (768x768 max) frame = cv2.resize(frame, (768, 768)) # 2. Encode as JPEG _, buffer = cv2.imencode('.jpg', frame,) # 3. Send as realtime input await session.send_realtime_input( media=types.Blob(data=buffer.tobytes(), mime_type="image/jpeg") ) # 4. Wait 1 second (1 FPS) await asyncio.sleep(1.0) cap.release() 

Configure media resolution

You can specify the resolution for input media by setting the media_resolution field in the session configuration. Lower resolution reduces token usage and latency, while higher resolution improves detail recognition. Supported values include low, medium, and high.

config = { "response_modalities": ["audio"], "media_resolution": "low", } 

What's next