Generate dialogue with multiple speakers

This page describes how to create a dialogue with multiple speakers created by Text-to-Speech.

You can generate audio with multiple speakers to create a dialogue. This can be useful for interviews, interactive storytelling, video games, e-learning platforms, and accessibility solutions.

The following voice is supported for audio with multiple speakers:

  • en-US-Studio-Multispeaker
    • speaker: R
    • speaker: S
    • speaker: T
    • speaker: U


Example. This sample is audio that was generated using multiple speakers.

Example of how to use multi-speaker markup

This is an example that demonstrates how to use multi-speaker markup.

Python

To learn how to install and use the client library for Text-to-Speech, see Text-to-Speech client libraries. For more information, see the Text-to-Speech Python API reference documentation.

To authenticate to Text-to-Speech, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

"""Synthesizes speech for multiple speakers. Make sure to be working in a virtual environment. """ from google.cloud import texttospeech_v1beta1 as texttospeech # Instantiates a client client = texttospeech.TextToSpeechClient() multi_speaker_markup = texttospeech.MultiSpeakerMarkup( turns=[ texttospeech.MultiSpeakerMarkup.Turn( text="I've heard that the Google Cloud multi-speaker audio generation sounds amazing!", speaker="R", ), texttospeech.MultiSpeakerMarkup.Turn( text="Oh? What's so good about it?", speaker="S" ), texttospeech.MultiSpeakerMarkup.Turn(text="Well..", speaker="R"), texttospeech.MultiSpeakerMarkup.Turn(text="Well what?", speaker="S"), texttospeech.MultiSpeakerMarkup.Turn( text="Well, you should find it out by yourself!", speaker="R" ), texttospeech.MultiSpeakerMarkup.Turn( text="Alright alright, let's try it out!", speaker="S" ), ] ) # Set the text input to be synthesized synthesis_input = texttospeech.SynthesisInput( multi_speaker_markup=multi_speaker_markup ) # Build the voice request, select the language code ('en-US') and the voice voice = texttospeech.VoiceSelectionParams( language_code="en-US", name="en-US-Studio-MultiSpeaker" ) # Select the type of audio file you want returned audio_config = texttospeech.AudioConfig( audio_encoding=texttospeech.AudioEncoding.MP3 ) # Perform the text-to-speech request on the text input with the selected # voice parameters and audio file type response = client.synthesize_speech( input=synthesis_input, voice=voice, audio_config=audio_config ) # The response's audio_content is binary. with open("output.mp3", "wb") as out: # Write the response to the output file. out.write(response.audio_content) print('Audio content written to file "output.mp3"')