This is a restructured and rewritten version of bshall/UniversalVocoding. The main difference here is that the model is turned into a TorchScript module during training and can be loaded for inferencing anywhere without Python dependencies.
Multiple directories containing audio files can be processed at the same time.
python preprocess.py VCTK-Corpus LibriTTS/train-clean-100 preprocessedpython train.py preprocessedYou can load a trained model anywhere and generate multiple waveforms parallelly.
import torch vocoder = torch.jit.load("vocoder.pt") mels = [ torch.randn(100, 80), torch.randn(200, 80), torch.randn(300, 80), ] with torch.no_grad(): wavs = vocoder.generate(mels)Emperically, if you're using the default architecture, you can generate 100 samples at the same time on an Nvidia GTX 1080 Ti.