DEV Community

Dilek Karasoy for Picovoice

Posted on

End-to-End Speech Recognition with Python

Let's start with why you should use Picovoice Python SDK when there are alternative libraries and in-depth tutorials on speech recognition with Python.

  1. Private - processes voice data on the device
  2. Cross-platform — Linux, macOS, Windows, Raspberry Pi, …
  3. Real-time - zero latency

-I do not need to say accurate I guess. I haven't seen any vendor claiming mediocre accuracy 🙃

Now, let's get started!

1 — Install Picovoice

pip3 install picovoice 
Enter fullscreen mode Exit fullscreen mode

2 — Create a Picovoice Instant
Picovoice SDK consists of Porcupine Wake Word, enabling custom hotwords and Rhino Speech-to-Intent, enabling custom voice commands. Jointly they enable hands-free experiences.
Porcupine, set an alarm for 1 hours and 13 seconds.
Porcupine detects the hotword "Porcupine", then Rhino captures the user’s intent and provides intent and intent details as seen below:

{ is_understood: true, intent: setAlarm, slots: { hours: 1, seconds: 13 } } 
Enter fullscreen mode Exit fullscreen mode

To create a Picovoice instance we need Porcupine and Rhino models, paths to the models and callbacks for hotword detection and inference completion. For the simplicity, we'll use pre-trained Porcupine and Rhino models, however, you can train custom ones on the Picovoice Console: While exploring the Picovoice Console, grab your AccessKey, too! Signing up for Picovoice Console is free, no credit card required.

from picovoice import Picovoice keyword_path = ... # path to Porcupine wake word file (.PPN) def wake_word_callback(): pass context_path = ... # path to Rhino context file (.RHN) def inference_callback(inference): print(inference.is_understood) if inference.is_understood: print(inference.intent) for k, v in inference.slots.items(): print(f"{k} : {v}") pv = Picovoice( access_key=${YOUR_ACCESS_KEY} keyword_path=keyword_path(), wake_word_callback=wake_word_callback, context_path=context_path(), inference_callback=inference_callback) 
Enter fullscreen mode Exit fullscreen mode

Do not forget to replace model path and AccessKey placeholders.

3 — Process Audio with Picovoice
Pass frames of audio to the engine:

pv.process(audio_frame) 
Enter fullscreen mode Exit fullscreen mode

4 — Read audio from the Microphone
Install [pvrecorder](https://pypi.org/project/pvrecorder/) and read the audio:

from pvrecoder import PvRecoder # `-1` is the default input audio device. recorder = PvRecoder(device_index=-1) recorder.start() 
Enter fullscreen mode Exit fullscreen mode

Read audio frames from the recorder and pass it to .process method:

pcm = recorder.read() pv.process(pcm) 
Enter fullscreen mode Exit fullscreen mode

5— Create a GUI with Tkinter
Tkinter is the standard GUI framework shipped with Python. Create a frame, add a label showing the remaining time to it, then launch:

window = tk.Tk() time_label = tk.Label(window, text='00 : 00 : 00') time_label.pack() window.protocol('WM_DELETE_WINDOW', on_close) window.mainloop() 
Enter fullscreen mode Exit fullscreen mode

Some resources:
Source code for the tutorial
Original Medium Article
Picovoice SDK
Picovoice Console

Top comments (0)