Skip to content

vadimkantorov/inferspeech

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

speech2text.py

This is a PyTorch inference script for the NVidia openseq2seq's wav2letter model to PyTorch.

The pretrained model weights for English were exported from a TensorFlow checkpoint to HDF5 using a little tfcheckpoint2pytorch script that I wrote.

Limitations: not ready for production, uses float32 weights; uses greedy decoder; does not chunk the input

Dependencies: pytorch (cpu version is OK), numpy, scipy, h5py; optional dependencies for saving the model weights to tfjs format: tensorflow v1.13.1 (install as pip3 install tensorflow==1.13.1), tensorflowjs (install as pip3 install tensorflowjs --no-deps, otherwise it would upgrade your TensorFlow from v1 to v2 and break everything)

The credit for the original wav2letter++ model goes to awesome Facebook AI Research scientists.

Example

# download the pretrained model weights for English and Russian wget https://github.com/vadimkantorov/inferspeech/releases/download/pretrained/w2l_plus_large_mp.h5 # English, Wav2Letter wget https://github.com/vadimkantorov/inferspeech/releases/download/pretrained/checkpoint_0010_epoch_01_iter_62500.model.h5 # Russian wget https://github.com/vadimkantorov/inferspeech/releases/download/pretrained/jasper10x5_LibriSpeech_nvgrad_masks.h5.part_aa # English, Jasper, part1 wget https://github.com/vadimkantorov/inferspeech/releases/download/pretrained/jasper10x5_LibriSpeech_nvgrad_masks.h5.part_ab # English, Jasper, part2 cat jasper10x5_LibriSpeech_nvgrad_masks.h5.part_aa jasper10x5_LibriSpeech_nvgrad_masks.h5.part_ab > jasper10x5_LibriSpeech_nvgrad_masks.h5 # download and transcribe a wav file (16 kHz) # should print: my heart doth plead that thou in him doth lie a closet never pierced with crystal eyes but the defendant doth that plea deny and says in him thy fair appearance lies wget https://github.com/vadimkantorov/inferspeech/releases/download/pretrained/121-123852-0004.wav python3 speech2text.py --model en_w2l --weights w2l_plus_large_mp.h5 -i 121-123852-0004.wav # use Wav2Letter model python3 speech2text.py --model en_w2l --weights jasper10x5_LibriSpeech_nvgrad_masks.h5 -i 121-123852-0004.wav # use Jasper model # transcribe some Russian wav file python3 speech2text.py --model ru_w2l --weights checkpoint_0010_epoch_01_iter_62500.model.h5 -i some_test.wav # save the model to ONNX format for inspection with https://lutzroeder.github.io/netron/ python3 speech2text.py --model en_w2l --weights w2l_plus_large_mp.h5 --onnx w2l_plus_large_mp.onnx # save the model to TensorFlow.js format python3 speech2text.py --model en_w2l --weights w2l_plus_large_mp.h5 --tfjs w2l_plus_large_mp.tfjs

Browser demo with TensorFlow.js (work in progress)

# download and extract the exported tfjs model wget https://github.com/vadimkantorov/inferspeech/releases/download/pretrained/w2l_plus_large_mp.tfjs.tar.gz tar -xf w2l_plus_large_mp.tfjs.tar.gz # serve the tfjs model and demo.html file python3 -m http.server # open the demo at http://localhost:8000/demo.html and transcribe the test file 121-123852-0004.wav

About

PyTorch speech2text inference script for the NVidia openseq2seq wav2letter model variant

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published