Posted on Feb 14, 2023

Speech Recognition with SwiftUI

#100daysofcode #challenge #swift #mobile

On day 31, we'll work with SwiftUI to build a voice assistant to get Siri pour coffee -at least visually. Of course, you can take it one step further and connect to a coffee maker.

Let's handle the speech recognition part.

The Speech framework offers cloud-dependent transcription, even for simple things like getting coffee. Rhino Speech-to-Intent is the best choice for the context-aware spoken language understanding.

** Let's start with a Simple Graphic User Interface**
Thankfully, SwiftUI has made creating visually appealing stateful UIs really easy. In about a half-hour you can mock-up a GUI with a coffee maker image, some text prompts, and a collection of stateful buttons similar to this:

Add the Picovoice Cocoapod
CocoaPods modernize iOS package management, enabling effortless integration of powerful app extensions by developers.

To install the Picovoice pod using Cocoapods, add the following to your Podfile:

source 'https://cdn.cocoapods.org/' // ... pod 'Picovoice-iOS'

Let's dive into Voice AI
Picovoice Platform SDK combines two speech recognition engines - Porcupine Wake Word and Rhino Speech-to-Intent. Together, they enable creating voice interactions akin to Alexa and Siri, while keeping the voice processing on-device. For example, in a command like
Hey Siri, could I have a medium coffee?
"Hey Siri" is detected by Porcupine, and the rest is inferred by Rhino through a specialized context without transcribing it to text.

When Rhino infers the utterance, it returns an instance of an Inference struct; for the above sample phrase, the struct will look like this:

IsUnderstood: true, Intent: 'orderBeverage', Slots: { size: 'medium', beverage: 'coffee' }

In order to initialize the voice AI, we’ll need both Porcupine (.ppn) and Rhino (.rhn) model files. Picovoice has made several pre-trained Porcupine and pre-trained Rhino models available on the Picovoice GitHub repositories. For this Barista app, we’re going to use the trigger phrase Hey Barista and the Coffee Maker context.

Download the hey barista_ios.ppn and coffee_maker_ios.rhn models.
Add them to the iOS project as a bundled resource.
Get your Picovoice AccessKey from the Picovoice Console for free if you haven't.

Now, we can load models at runtime. Let's initialize the Picovoice Platform:

import Picovoice let accessKey = "..." // your Picovoice AccessKey let contextPath = Bundle.main.path(forResource: "coffee_maker_ios", ofType: "rhn") let keywordPath = Bundle.main.path(forResource: "hey barista_ios", ofType: "ppn") var picovoiceManager:PicovoiceManager! init() { do { picovoiceManager = PicovoiceManager( accessKey: accessKey, keywordPath: keywordPath!, onWakeWordDetection: { // wake word detected }, contextPath: contextPath!, onInference: { inference in // inference result }) try picovoiceManager.start() } catch { print("\(error)") } }

The method picovoiceManager.start() starts audio capture and passes audio streams to the engines.

To capture microphone audio, we must add the permission request to the Info.plist:

<key>NSMicrophoneUsageDescription</key> <string>To recognize voice commands</string>

Integrate Voice Controls
To manage SwiftUI programmatically, we'll generate a ViewModel and have the UI observe it. The required UI controls are straightforward: 1. indicating the detection of the wake word, and 2. showing the drink order. Create a struct to represent buttons and state variables to show and hide text; the UI will then be bound to these parameters because they use the Published keyword. So ViewModel will look like:

import SwiftUI import Picovoice struct CapsuleSelection: Codable, Identifiable{ var title:String var id:String var isSelected:Bool init(title:String) { self.title = title self.id = title.lowercased() self.isSelected = false } } class ViewModel: ObservableObject { @Published var sizeSel = [CapsuleSelection(title: "Small"), CapsuleSelection(title: "Medium"), CapsuleSelection(title: "Large")] @Published var shotSel = [CapsuleSelection(title: "Single Shot"), CapsuleSelection(title: "Double Shot"), CapsuleSelection(title: "Triple Shot")] @Published var bevSel = [CapsuleSelection(title: "Americano"), CapsuleSelection(title: "Cappuccino"), CapsuleSelection(title: "Coffee"), CapsuleSelection(title: "Espresso"), CapsuleSelection(title: "Latte"), CapsuleSelection(title: "Mocha")] @Published var isListening = false @Published var missedCommand = false let accessKey = "..." // your Picovoice AccessKey let contextPath = Bundle.main.path(forResource: "coffee_maker_ios", ofType: "rhn") let keywordPath = Bundle.main.path(forResource: "hey barista_ios", ofType: "ppn") var picovoiceManager:PicovoiceManager! init() { do { picovoiceManager = PicovoiceManager( accessKey: accessKey, keywordPath: keywordPath!, onWakeWordDetection: { DispatchQueue.main.async { self.isListening = true self.missedCommand = false } }, contextPath: contextPath!, onInference: { inference in DispatchQueue.main.async { if inference.isUnderstood { if inference.intent == "orderBeverage" { // parse size if let size = inference.slots["size"]{ if let i = self.sizeSel.firstIndex( where: { $0.id == size }) { self.sizeSel[i].isSelected = true } } // repeat for 'numShots' and 'beverage'... } } else { self.missedCommand = true } self.isListening = false } }) try picovoiceManager.start() } catch { print("\(error)") } } }

Finally, Siri understands how you want to get your coffee without connecting to the internet!

Below are some useful resources:
Open-source code
Picovoice Platform SDK
Picovoice website

DEV Community

Speech Recognition with SwiftUI

Top comments (0)