Skip to content

Commit 28c699e

Browse files
committed
update README.md
1 parent e3de949 commit 28c699e

File tree

1 file changed

+27
-19
lines changed

1 file changed

+27
-19
lines changed

README.md

Lines changed: 27 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -3,39 +3,47 @@
33
This is a restructured and rewritten version of [bshall/UniversalVocoding](https://github.com/bshall/UniversalVocoding).
44
The main difference here is that the model is turned into a [TorchScript](https://pytorch.org/docs/stable/jit.html) module during training and can be loaded for inferencing anywhere without Python dependencies.
55

6-
### Preprocess training data
6+
## Generate waveforms using pretrained models
77

8-
Multiple directories containing audio files can be processed at the same time.
9-
10-
```bash
11-
python preprocess.py VCTK-Corpus LibriTTS/train-clean-100 preprocessed
12-
```
13-
14-
### Train from scratch
15-
16-
```bash
17-
python train.py preprocessed
18-
```
19-
20-
### Generate waveforms
21-
22-
You can load a trained model anywhere and generate multiple waveforms parallelly.
8+
Since the pretrained models were turned to TorchScript, you can load a trained model anywhere.
9+
Also you can generate multiple waveforms parallelly, e.g.
2310

2411
```python
2512
import torch
2613

2714
vocoder = torch.jit.load("vocoder.pt")
15+
2816
mels = [
2917
torch.randn(100, 80),
3018
torch.randn(200, 80),
3119
torch.randn(300, 80),
32-
]
20+
] # (length, mel_dim)
21+
3322
with torch.no_grad():
3423
wavs = vocoder.generate(mels)
3524
```
3625

37-
Emperically, if you're using the default architecture, you can generate 100 samples at the same time on an Nvidia GTX 1080 Ti.
26+
Emperically, if you're using the default architecture, you can generate 30 samples at the same time on an GTX 1080 Ti.
27+
28+
## Train from scratch
29+
30+
Multiple directories containing audio files can be processed at the same time, e.g.
31+
32+
```bash
33+
python preprocess.py \
34+
VCTK-Corpus \
35+
LibriTTS/train-clean-100 \
36+
preprocessed # the output directory of preprocessed data
37+
```
38+
39+
And train the model with the preprocessed data, e.g.
40+
41+
```bash
42+
python train.py preprocessed
43+
```
44+
45+
With the default settings, it would take around 12 hr to train to 100K steps on an RTX 2080 Ti.
3846

39-
### References
47+
## References
4048

4149
- [Towards achieving robust universal neural vocoding](https://arxiv.org/abs/1811.06292)

0 commit comments

Comments
 (0)