You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: PyTorch/SpeechSynthesis/FastPitch/README.md
+3-5Lines changed: 3 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -48,17 +48,15 @@ This repository provides a script and recipe to train the FastPitch model to ach
48
48
49
49
## Model overview
50
50
51
-
[FastPitch](https://arxiv.org/abs/2006.06873) is one of two major components in a neural, text-to-speech (TTS) system:
51
+
[FastPitch](https://arxiv.org/abs/2006.06873) is a fully-parallel transformer architecture with prosody control over pitch and individual phoneme duration.
52
+
It is one of two major components in a neural, text-to-speech (TTS) system:
52
53
53
54
* a mel-spectrogram generator such as [FastPitch](https://arxiv.org/abs/2006.06873) or [Tacotron 2](https://arxiv.org/abs/1712.05884), and
54
55
* a waveform synthesizer such as [WaveGlow](https://arxiv.org/abs/1811.00002) (see [NVIDIA example code](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2)).
55
56
56
57
Such two-component TTS system is able to synthesize natural sounding speech from raw transcripts.
57
58
58
-
The FastPitch model generates mel-spectrograms and predicts a pitch contour from raw input text. It allows to exert additional control over the synthesized utterances, such as:
59
-
* modify the pitch contour to control the prosody,
60
-
* increase or decrease the fundamental frequency in a naturally sounding way, that preserves the perceived identity of the speaker,
61
-
* alter the pace of speech.
59
+
The FastPitch model generates mel-spectrograms and predicts a pitch contour from raw input text.
62
60
Some of the capabilities of FastPitch are presented on the website with [samples](https://fastpitch.github.io/).
63
61
64
62
Speech synthesized with FastPitch has state-of-the-art quality, and does not suffer from missing/repeating phrases like Tacotron2 does.
0 commit comments