Let's start with why you should use Picovoice Python SDK when there are alternative libraries and in-depth tutorials on speech recognition with Python.
- Private - processes voice data on the device
- Cross-platform — Linux, macOS, Windows, Raspberry Pi, …
- Real-time - zero latency
-I do not need to say accurate I guess. I haven't seen any vendor claiming mediocre accuracy 🙃
Now, let's get started!
1 — Install Picovoice
pip3 install picovoice
2 — Create a Picovoice Instant
Picovoice SDK consists of Porcupine Wake Word, enabling custom hotwords and Rhino Speech-to-Intent, enabling custom voice commands. Jointly they enable hands-free experiences.
Porcupine, set an alarm for 1 hours and 13 seconds.
Porcupine detects the hotword "Porcupine", then Rhino captures the user’s intent and provides intent and intent details as seen below:
{ is_understood: true, intent: setAlarm, slots: { hours: 1, seconds: 13 } }
To create a Picovoice instance we need Porcupine and Rhino models, paths to the models and callbacks for hotword detection and inference completion. For the simplicity, we'll use pre-trained Porcupine and Rhino models, however, you can train custom ones on the Picovoice Console: While exploring the Picovoice Console, grab your AccessKey
, too! Signing up for Picovoice Console is free, no credit card required.
from picovoice import Picovoice keyword_path = ... # path to Porcupine wake word file (.PPN) def wake_word_callback(): pass context_path = ... # path to Rhino context file (.RHN) def inference_callback(inference): print(inference.is_understood) if inference.is_understood: print(inference.intent) for k, v in inference.slots.items(): print(f"{k} : {v}") pv = Picovoice( access_key=${YOUR_ACCESS_KEY} keyword_path=keyword_path(), wake_word_callback=wake_word_callback, context_path=context_path(), inference_callback=inference_callback)
Do not forget to replace model path
and AccessKey
placeholders.
3 — Process Audio with Picovoice
Pass frames of audio to the engine:
pv.process(audio_frame)
4 — Read audio from the Microphone
Install [pvrecorder](https://pypi.org/project/pvrecorder/)
and read the audio:
from pvrecoder import PvRecoder # `-1` is the default input audio device. recorder = PvRecoder(device_index=-1) recorder.start()
Read audio frames from the recorder and pass it to .process
method:
pcm = recorder.read() pv.process(pcm)
5— Create a GUI with Tkinter
Tkinter is the standard GUI framework shipped with Python. Create a frame, add a label showing the remaining time to it, then launch:
window = tk.Tk() time_label = tk.Label(window, text='00 : 00 : 00') time_label.pack() window.protocol('WM_DELETE_WINDOW', on_close) window.mainloop()
Some resources:
Source code for the tutorial
Original Medium Article
Picovoice SDK
Picovoice Console
Top comments (0)