speech2text.py

This is a PyTorch inference script for the NVidia openseq2seq's wav2letter model to PyTorch.

The pretrained model weights for English were exported from a TensorFlow checkpoint to HDF5 using a little tfcheckpoint2pytorch script that I wrote.

Limitations: not ready for production, uses float32 weights; uses greedy decoder; does not chunk the input

Dependencies: pytorch (cpu version is OK), numpy, scipy, h5py; optional dependencies for saving the model weights to tfjs format: tensorflow v1.13.1 (install as pip3 install tensorflow==1.13.1), tensorflowjs (install as pip3 install tensorflowjs --no-deps, otherwise it would upgrade your TensorFlow from v1 to v2 and break everything)

The credit for the original wav2letter++ model goes to awesome Facebook AI Research scientists.

Example

# download the pretrained model weights for English and Russian wget https://github.com/vadimkantorov/inferspeech/releases/download/pretrained/w2l_plus_large_mp.h5 # English, Wav2Letter wget https://github.com/vadimkantorov/inferspeech/releases/download/pretrained/checkpoint_0010_epoch_01_iter_62500.model.h5 # Russian wget https://github.com/vadimkantorov/inferspeech/releases/download/pretrained/jasper10x5_LibriSpeech_nvgrad_masks.h5.part_aa # English, Jasper, part1 wget https://github.com/vadimkantorov/inferspeech/releases/download/pretrained/jasper10x5_LibriSpeech_nvgrad_masks.h5.part_ab # English, Jasper, part2 cat jasper10x5_LibriSpeech_nvgrad_masks.h5.part_aa jasper10x5_LibriSpeech_nvgrad_masks.h5.part_ab > jasper10x5_LibriSpeech_nvgrad_masks.h5 # download and transcribe a wav file (16 kHz) # should print: my heart doth plead that thou in him doth lie a closet never pierced with crystal eyes but the defendant doth that plea deny and says in him thy fair appearance lies wget https://github.com/vadimkantorov/inferspeech/releases/download/pretrained/121-123852-0004.wav python3 speech2text.py --model en_w2l --weights w2l_plus_large_mp.h5 -i 121-123852-0004.wav # use Wav2Letter model python3 speech2text.py --model en_w2l --weights jasper10x5_LibriSpeech_nvgrad_masks.h5 -i 121-123852-0004.wav # use Jasper model # transcribe some Russian wav file python3 speech2text.py --model ru_w2l --weights checkpoint_0010_epoch_01_iter_62500.model.h5 -i some_test.wav # save the model to ONNX format for inspection with https://lutzroeder.github.io/netron/ python3 speech2text.py --model en_w2l --weights w2l_plus_large_mp.h5 --onnx w2l_plus_large_mp.onnx # save the model to TensorFlow.js format python3 speech2text.py --model en_w2l --weights w2l_plus_large_mp.h5 --tfjs w2l_plus_large_mp.tfjs

Browser demo with TensorFlow.js (work in progress)

# download and extract the exported tfjs model wget https://github.com/vadimkantorov/inferspeech/releases/download/pretrained/w2l_plus_large_mp.tfjs.tar.gz tar -xf w2l_plus_large_mp.tfjs.tar.gz # serve the tfjs model and demo.html file python3 -m http.server # open the demo at http://localhost:8000/demo.html and transcribe the test file 121-123852-0004.wav

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
README.md		README.md
deepdream.py		deepdream.py
demo.html		demo.html
diag.py		diag.py
speech2text.py		speech2text.py
vis.py		vis.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

speech2text.py

Example

Browser demo with TensorFlow.js (work in progress)

About

Uh oh!

Releases 1

Packages

Languages

vadimkantorov/inferspeech

Folders and files

Latest commit

History

Repository files navigation

speech2text.py

Example

Browser demo with TensorFlow.js (work in progress)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages