Natural sounding text-to-speech in the terminal (and more).
This is NOT intended to be a completely-free, pick-up-and-use TTS solution. In fact, it is simply a wrapper around Google's Cloud Text-to-Speech API.
You will need:
- A GCP account with billing enabled.
- Google gives you 1 million characters free every month. That's nearly 10 books a month. It's essentially free for personal use.
- Once you have a GCP account, enable the TTS API and get a service account.
- Export service account credentials in your shell. You will need to do this every time you open a new shell. Add it to your shell configuration or make a script to run
goslingfor convenience.export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account.json"
- Internet connection every time you need some text spoken to you.
- I have only tested this on Linux. Commands for playing audio will be different on other platforms.
defaults.mp4
(the multiple exclamations are something that I have seen other TTSs struggle with):
Welcome to gosling!!! It has options such as "Pitch adjustment" in the range -20.0 to 20.0, "Speaking rate/speed" in the range 0.25 to 4.0 and "Volume gain" (in dB) in the range -96.0 to 16.0. numbers_punc.mp4
Kannada:
kannada.mp4
Check out the full voice list, use Wavenet or Neural2 based voices for better quality.
Go to the latest release, scroll down to "Assets" and download the correct file for your platform. Unzip the file and run the gosling binary inside:
./goslinggo install github.com/Samyak2/gosling@latestgosling input.txt output.mp3Play the resulting output.mp3 file using your audio player.
echo "hello there" | gosling - output.mp3If you have the play command, which is usually a part of the sox package (sudo dnf install sox on Fedora):
echo "hello there" | gosling - - | play -t mp3 -If you have the ffplay command, which is a part of ffmpeg:
echo "hello there" | gosling - - | ffplay -nodisp -autoexit -gosling has a lot of configuration around language & voice, audio, etc.
See gosling --help for all options.
Usage: gosling <input-file> <output-file> Arguments: <input-file> Text file to read from. Use - for standard input. <output-file> Audio file to write to. Use - for standard output. Flags: -h, --help Show context-sensitive help. -l, --language-code="en-US" Language code to use for the synthesis. See full list at: https://cloud.google.com/text-to-speech/docs/voices -v, --voice-name="en-US-Wavenet-A" Voice name to use for the synthesis. Use an empty string to let the GCP API choose. See full list at: https://cloud.google.com/text-to-speech/docs/voices --pitch=-3 Pitch adjustment in the range [-20.0, 20.0]. Use a negative number to decrease the pitch. See: https://cloud.google.com/text-to-speech/docs/reference/rest/v1/text/synthesize#audioconfig -r, --speaking-rate=1.0 Speaking rate/speed in the range [0.25, 4.0]. See: https://cloud.google.com/text-to-speech/docs/reference/rest/v1/text/synthesize#audioconfig --volume-gain=0.0 Volume gain (in dB) in the range [-96.0, 16.0]. See: https://cloud.google.com/text-to-speech/docs/reference/rest/v1/text/synthesize#audioconfig -s, --[no-]ssml Use if text has SSML. Default is plain text. See: https://cloud.google.com/text-to-speech/docs/basics#speech_synthesis_markup_language_ssml_support --service-endpoint=STRING GCP Service Endpoint. You'll need to set this if you want a Neural2 voice. See: https://cloud.google.com/text-to-speech/docs/endpoints. By default, on the default language, gosling uses a WaveNet based voice model. If you're using a different language, make sure to switch the voice to a WaveNet based one too. Use --voice-name for this.
If WaveNet is not good enough, try using a Neural2 voice type (search for Neural2 in the voice list if you need other languages):
gosling input.txt output.mp3 --service-endpoint 'https://us-central1-texttospeech.googleapis.com' -v en-US-Neural2-A TODO: this endpoint is currently timing out for all TTS requests, not sure why.
If Neural2 isn't good enough either, well... you'll have to take this up with Google.
Either:
- You did not read the Pre-requisites section.
- You forgot to export the
GOOGLE_APPLICATION_CREDENTIALSenvironment variable in your shell. - Something is wrong with your GCP service account. See this page that is also linked from the error.
These options can have negative values and the command-line parser I use behaves weirdly with negative numbers and short flags. I have removed the short versions to avoid making it a pitfall.
How do I use this with foliate?
I use this script:
#!/bin/bash # requires gosling and sox export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account.json" gosling - - | play -t mp3 - & trap 'kill $!; exit 0' INT waitCopy and save this to a file and chmod +x /path/to/foliate-gosling.sh it.
TODO: this only works with English text. I need to figure out a way to convert FOLIATE_TTS_LANG_LOWER to Google's format.
When I'm too lazy to read an article, I use Google Assistant's "read me this article" feature on my phone. It's extremely good, especially with text-only articles. I could not find an alternative on desktop (specifically, Linux).
Yes, there are quite a few text-to-speech apps on Linux. Most of them either sound like R2D2 or something from the depths of the void. The only one, that I found, which sounds bearable uses an undocumented Google Translate API (probably a ToS violation?). There are also some pre-trained neural-network based models, but they sound like a person speaking through a very low-bandwidth voice call and they skip over numbers and abbreviations pretending they never existed.
The only text-to-speech that sounded good was Google's. So I thought - "they must have a GCP API for this". And they did. And I hacked this together.
-
speech-dispatchersupport. This will allow using it in Firefox's reader mode, for example. - Some pre-processing of raw text - remove extra/unnecessary punctuation, better formatting for numbers, etc.
MIT