|
3 | 3 | This is a restructured and rewritten version of [bshall/UniversalVocoding](https://github.com/bshall/UniversalVocoding). |
4 | 4 | The main difference here is that the model is turned into a [TorchScript](https://pytorch.org/docs/stable/jit.html) module during training and can be loaded for inferencing anywhere without Python dependencies. |
5 | 5 |
|
6 | | -### Preprocess training data |
| 6 | +## Generate waveforms using pretrained models |
7 | 7 |
|
8 | | -Multiple directories containing audio files can be processed at the same time. |
9 | | - |
10 | | -```bash |
11 | | -python preprocess.py VCTK-Corpus LibriTTS/train-clean-100 preprocessed |
12 | | -``` |
13 | | - |
14 | | -### Train from scratch |
15 | | - |
16 | | -```bash |
17 | | -python train.py preprocessed |
18 | | -``` |
19 | | - |
20 | | -### Generate waveforms |
21 | | - |
22 | | -You can load a trained model anywhere and generate multiple waveforms parallelly. |
| 8 | +Since the pretrained models were turned to TorchScript, you can load a trained model anywhere. |
| 9 | +Also you can generate multiple waveforms parallelly, e.g. |
23 | 10 |
|
24 | 11 | ```python |
25 | 12 | import torch |
26 | 13 |
|
27 | 14 | vocoder = torch.jit.load("vocoder.pt") |
| 15 | + |
28 | 16 | mels = [ |
29 | 17 | torch.randn(100, 80), |
30 | 18 | torch.randn(200, 80), |
31 | 19 | torch.randn(300, 80), |
32 | | -] |
| 20 | +] # (length, mel_dim) |
| 21 | + |
33 | 22 | with torch.no_grad(): |
34 | 23 | wavs = vocoder.generate(mels) |
35 | 24 | ``` |
36 | 25 |
|
37 | | -Emperically, if you're using the default architecture, you can generate 100 samples at the same time on an Nvidia GTX 1080 Ti. |
| 26 | +Emperically, if you're using the default architecture, you can generate 30 samples at the same time on an GTX 1080 Ti. |
| 27 | + |
| 28 | +## Train from scratch |
| 29 | + |
| 30 | +Multiple directories containing audio files can be processed at the same time, e.g. |
| 31 | + |
| 32 | +```bash |
| 33 | +python preprocess.py \ |
| 34 | + VCTK-Corpus \ |
| 35 | + LibriTTS/train-clean-100 \ |
| 36 | + preprocessed # the output directory of preprocessed data |
| 37 | +``` |
| 38 | + |
| 39 | +And train the model with the preprocessed data, e.g. |
| 40 | + |
| 41 | +```bash |
| 42 | +python train.py preprocessed |
| 43 | +``` |
| 44 | + |
| 45 | +With the default settings, it would take around 12 hr to train to 100K steps on an RTX 2080 Ti. |
38 | 46 |
|
39 | | -### References |
| 47 | +## References |
40 | 48 |
|
41 | 49 | - [Towards achieving robust universal neural vocoding](https://arxiv.org/abs/1811.06292) |
0 commit comments