SpeechToTextLoader
allows to transcribe audio files with the Google Cloud Speech-to-Text API and loads the transcribed text into documents. To use it, you should have the google-cloud-speech
python package installed, and a Google Cloud project with the Speech-to-Text API enabled. Installation & setup
First, you need to install thegoogle-cloud-speech
python package. You can find more info about it on the Speech-to-Text client libraries page. Follow the quickstart guide in the Google Cloud documentation to create a project and enable the API. Example
TheSpeechToTextLoader
must include the project_id
and file_path
arguments. Audio files can be specified as a Google Cloud Storage URI (gs://...
) or a local file path. Only synchronous requests are supported by the loader, which has a limit of 60 seconds or 10MB per audio file. loader.load()
blocks until the transcription is finished. The transcribed text is available in the page_content
: metadata
contains the full JSON response with more meta information: Recognition Config
You can specify theconfig
argument to use different speech recognition models and enable specific features. Refer to the Speech-to-Text recognizers documentation and the RecognizeRequest
API reference for information on how to set a custom configuation. If you don’t specify a config
, the following options will be selected automatically: - Model: Chirp Universal Speech Model
- Language:
en-US
- Audio Encoding: Automatically Detected
- Automatic Punctuation: Enabled