Text Chunking for TTS | Deepgram's Docs

Why Text Chunking Matters

Text chunking significantly reduces perceived latency in TTS applications by allowing audio playback to begin sooner. This is especially important for conversational AI and voice agents where responsiveness is critical.

Instead of waiting for the entire audio to be generated, chunking lets you:

Begin audio playback much faster
Create more responsive voice experiences
Maintain natural-sounding speech

Basic Sentence Chunking

The simplest and most effective approach is to split text at sentence boundaries. This preserves natural speech patterns while enabling faster time-to-first-byte:

1 # For help migrating to the new Python SDK, check out our migration guide:
2 # https://github.com/deepgram/deepgram-python-sdk/blob/main/docs/Migrating-v3-to-v5.md
3 
4 import re
5 
6 def chunk_by_sentence(text):
7  # Split text at sentence boundaries (periods, question marks, exclamation points)
8  # while preserving the punctuation
9  sentences = re.split(r'(?<=[.!?])\s+', text)
10 
11  # Remove any empty chunks
12  return [sentence for sentence in sentences if sentence]
13 
14 # Example usage
15 text = "Hello, welcome to Deepgram. This is an example of text chunking. How does it sound?"
16 chunks = chunk_by_sentence(text)
17 
18 for i, chunk in enumerate(chunks):
19  print(f"Chunk {i+1}: {chunk}")
20 
21 # Output:
22 # Chunk 1: Hello, welcome to Deepgram.
23 # Chunk 2: This is an example of text chunking.
24 # Chunk 3: How does it sound?

Processing Streaming Text with WebSockets

When working with streaming text (like from an LLM), you need to collect tokens until you have complete sentences. Here’s a simplified approach to process text chunks that arrive as paragraphs:

1 # For help migrating to the new Python SDK, check out our migration guide:
2 # https://github.com/deepgram/deepgram-python-sdk/blob/main/docs/Migrating-v3-to-v5.md
3 
4 import re
5 import asyncio
6 from deepgram import AsyncDeepgramClient
7 
8 def chunk_by_sentence(text):
9  # Split text at sentence boundaries (periods, question marks, exclamation points)
10  # while preserving the punctuation
11  sentences = re.split(r'(?<=[.!?])\s+', text)
12 
13  # Remove any empty chunks
14  return [sentence for sentence in sentences if sentence]
15 
16 class SimpleTextChunker:
17  def __init__(self, deepgram_client):
18  self.queue = [] # Queue to store incoming paragraph chunks
19  self.processed_sentences = set()
20  self.deepgram_client = deepgram_client
21 
22  async def process_text_stream(self, paragraph):
23  """Process an array of paragraph chunks, each containing 1-2 sentences"""
24 
25  # Queue paragraph as it arrives (simulating fast reception)
26  self.queue.append(paragraph)
27  print(f"Received and queued paragraph: {paragraph}")
28 
29  # You could preprocess paragraphs here and split them by more than just sentence boundaries
30 
31  # Process the queue
32  while self.queue:
33  # Get the next paragraph from the queue
34  paragraph = self.queue.pop(0)
35 
36  # Split paragraph into sentences using our chunk_by_sentence function
37  sentences = chunk_by_sentence(paragraph)
38 
39  # Process each sentence
40  for sentence in sentences:
41  if sentence and sentence not in self.processed_sentences:
42  # Send the sentence to TTS
43  print(f"Sending sentence to TTS: {sentence}")
44  audio_response = await self.deepgram_client.speak.v1.audio.generate(
45  text=sentence,
46  model="aura-2-thalia-en",
47  sample_rate=24000
48  )
49  # In a real app, you would play this audio immediately
50  self.processed_sentences.add(sentence)
51 
52 # Example usage with an array of paragraph chunks
53 async def main():
54  # This simulates text coming in as paragraph chunks from an LLM
55  paragraph_chunks = [
56  "Deepgram's TTS API offers low latency. It works great for voice agents.",
57  "This approach simulates receiving chunks as paragraphs. Each paragraph may contain one or two sentences.",
58  "Try it today! You'll be impressed with the results."
59  ]
60 
61  # Set up TTS client
62  deepgram = AsyncDeepgramClient()
63 
64  # Set up listeners for TTS events to handle audio data and connection status
65 
66  chunker = SimpleTextChunker(deepgram)
67  # Process each paragraph sequentially
68  for paragraph in paragraph_chunks:
69  await chunker.process_text_stream(paragraph)
70 
71 # Run the example
72 if __name__ == "__main__":
73  asyncio.run(main())

For complete details on implementing the TTS WebSocket connection, see our guide on Real-Time TTS with WebSockets.

Processing Chunked Text

After creating chunks, you have two main options for processing them:

Sequential Processing

Process each chunk in sequence, prioritizing the first chunk:

1 # For help migrating to the new Python SDK, check out our migration guide:
2 # https://github.com/deepgram/deepgram-python-sdk/blob/main/docs/Migrating-v3-to-v5.md
3 
4 async def process_chunks_sequential(chunks, tts_function):
5  results = []
6  for i, chunk in enumerate(chunks):
7  # You might prioritize the first chunk for faster response
8  result = await tts_function(chunk)
9  results.append(result)
10  return results

Setting Chunk Size

For most applications, sentences work well as chunks. If you need finer control:

Voice assistants: Aim for 50-100 character chunks
Call center bots: Use complete sentences (most natural)
Long-form content: Larger chunks (200-400 characters) preserve intonation

Other Chunking Strategies

If you need more advanced chunking methods, search for these techniques:

Clause-based chunking: Splits long sentences at commas and semicolons
NLP-based chunking: Uses natural language processing to find semantic boundaries
Adaptive chunking: Adjusts chunk size based on content complexity
First-chunk optimization: Specially optimizes the first chunk for minimal latency
SSML chunking: Handles Speech Synthesis Markup Language tags when chunking

For WebSocket implementation details to stream the chunked audio, see our guide on Real-Time TTS with WebSockets.