Text Chunking for TTS

Basic techniques for breaking text into chunks to reduce latency in Text-to-Speech applications.

Why Text Chunking Matters

Text chunking significantly reduces perceived latency in TTS applications by allowing audio playback to begin sooner. This is especially important for conversational AI and voice agents where responsiveness is critical.

Instead of waiting for the entire audio to be generated, chunking lets you:

  • Begin audio playback much faster
  • Create more responsive voice experiences
  • Maintain natural-sounding speech

Basic Sentence Chunking

The simplest and most effective approach is to split text at sentence boundaries. This preserves natural speech patterns while enabling faster time-to-first-byte:

1# For help migrating to the new Python SDK, check out our migration guide:
2# https://github.com/deepgram/deepgram-python-sdk/blob/main/docs/Migrating-v3-to-v5.md
3
4import re
5
6def chunk_by_sentence(text):
7 # Split text at sentence boundaries (periods, question marks, exclamation points)
8 # while preserving the punctuation
9 sentences = re.split(r'(?<=[.!?])\s+', text)
10
11 # Remove any empty chunks
12 return [sentence for sentence in sentences if sentence]
13
14# Example usage
15text = "Hello, welcome to Deepgram. This is an example of text chunking. How does it sound?"
16chunks = chunk_by_sentence(text)
17
18for i, chunk in enumerate(chunks):
19 print(f"Chunk {i+1}: {chunk}")
20
21# Output:
22# Chunk 1: Hello, welcome to Deepgram.
23# Chunk 2: This is an example of text chunking.
24# Chunk 3: How does it sound?

Processing Streaming Text with WebSockets

When working with streaming text (like from an LLM), you need to collect tokens until you have complete sentences. Here’s a simplified approach to process text chunks that arrive as paragraphs:

1# For help migrating to the new Python SDK, check out our migration guide:
2# https://github.com/deepgram/deepgram-python-sdk/blob/main/docs/Migrating-v3-to-v5.md
3
4import re
5import asyncio
6from deepgram import AsyncDeepgramClient
7
8def chunk_by_sentence(text):
9 # Split text at sentence boundaries (periods, question marks, exclamation points)
10 # while preserving the punctuation
11 sentences = re.split(r'(?<=[.!?])\s+', text)
12
13 # Remove any empty chunks
14 return [sentence for sentence in sentences if sentence]
15
16class SimpleTextChunker:
17 def __init__(self, deepgram_client):
18 self.queue = [] # Queue to store incoming paragraph chunks
19 self.processed_sentences = set()
20 self.deepgram_client = deepgram_client
21
22 async def process_text_stream(self, paragraph):
23 """Process an array of paragraph chunks, each containing 1-2 sentences"""
24
25 # Queue paragraph as it arrives (simulating fast reception)
26 self.queue.append(paragraph)
27 print(f"Received and queued paragraph: {paragraph}")
28
29 # You could preprocess paragraphs here and split them by more than just sentence boundaries
30
31 # Process the queue
32 while self.queue:
33 # Get the next paragraph from the queue
34 paragraph = self.queue.pop(0)
35
36 # Split paragraph into sentences using our chunk_by_sentence function
37 sentences = chunk_by_sentence(paragraph)
38
39 # Process each sentence
40 for sentence in sentences:
41 if sentence and sentence not in self.processed_sentences:
42 # Send the sentence to TTS
43 print(f"Sending sentence to TTS: {sentence}")
44 audio_response = await self.deepgram_client.speak.v1.audio.generate(
45 text=sentence,
46 model="aura-2-thalia-en",
47 sample_rate=24000
48 )
49 # In a real app, you would play this audio immediately
50 self.processed_sentences.add(sentence)
51
52# Example usage with an array of paragraph chunks
53async def main():
54 # This simulates text coming in as paragraph chunks from an LLM
55 paragraph_chunks = [
56 "Deepgram's TTS API offers low latency. It works great for voice agents.",
57 "This approach simulates receiving chunks as paragraphs. Each paragraph may contain one or two sentences.",
58 "Try it today! You'll be impressed with the results."
59 ]
60
61 # Set up TTS client
62 deepgram = AsyncDeepgramClient()
63
64 # Set up listeners for TTS events to handle audio data and connection status
65
66 chunker = SimpleTextChunker(deepgram)
67 # Process each paragraph sequentially
68 for paragraph in paragraph_chunks:
69 await chunker.process_text_stream(paragraph)
70
71# Run the example
72if __name__ == "__main__":
73 asyncio.run(main())

For complete details on implementing the TTS WebSocket connection, see our guide on Real-Time TTS with WebSockets.

Processing Chunked Text

After creating chunks, you have two main options for processing them:

Sequential Processing

Process each chunk in sequence, prioritizing the first chunk:

1# For help migrating to the new Python SDK, check out our migration guide:
2# https://github.com/deepgram/deepgram-python-sdk/blob/main/docs/Migrating-v3-to-v5.md
3
4async def process_chunks_sequential(chunks, tts_function):
5 results = []
6 for i, chunk in enumerate(chunks):
7 # You might prioritize the first chunk for faster response
8 result = await tts_function(chunk)
9 results.append(result)
10 return results

Setting Chunk Size

For most applications, sentences work well as chunks. If you need finer control:

  • Voice assistants: Aim for 50-100 character chunks
  • Call center bots: Use complete sentences (most natural)
  • Long-form content: Larger chunks (200-400 characters) preserve intonation

Other Chunking Strategies

If you need more advanced chunking methods, search for these techniques:

  • Clause-based chunking: Splits long sentences at commas and semicolons
  • NLP-based chunking: Uses natural language processing to find semantic boundaries
  • Adaptive chunking: Adjusts chunk size based on content complexity
  • First-chunk optimization: Specially optimizes the first chunk for minimal latency
  • SSML chunking: Handles Speech Synthesis Markup Language tags when chunking

For WebSocket implementation details to stream the chunked audio, see our guide on Real-Time TTS with WebSockets.