This is a submission for the AssemblyAI Voice Agents Challenge
What I Built
I created a real-time voice-controlled spell casting system that transforms spoken Harry Potter spells into instant keyboard commands. This project addresses the Real-Time Performance category by achieving ultra-low latency voice recognition for gaming applications where every millisecond matters.
The system recognizes over 30 different spells (like "Lumos", "Wingardium Leviosa", "Stupefy") and instantly triggers corresponding game actions through keyboard shortcuts. It features advanced fuzzy matching to handle pronunciation variations and partial transcript processing for immediate response - perfect for immersive gaming experiences.
Demo
🎥 YouTube Demo Video - Watch the spell casting in action!
Key features demonstrated:
- ⚡ Sub-300ms response time from speech to action
- 🎯 Accurate recognition of complex spell names
- 🔄 Handles pronunciation variations and partial words
- 🛡️ Smart spam prevention for rapid casting
- 🎮 Seamless integration with game controls
GitHub Repository
⚡ Hogwarts Spell Caster
A real-time voice-controlled spell casting system that transforms spoken Harry Potter spells into instant keyboard commands using AssemblyAI's Ultra-Fast Universal-Streaming technology. Cast spells with your voice and watch them trigger game actions in under 300ms!
🎯 Features
- ⚡ Ultra-Low Latency: Sub-300ms response time from speech to action
- 🎭 30+ Harry Potter Spells: Complete spell repertoire from the wizarding world
- 🧠 Intelligent Recognition: Advanced fuzzy matching handles pronunciation variations
- 🚀 Partial Processing: Acts on incomplete words for instant response
- 🛡️ Spam Prevention: Smart cooldowns prevent accidental rapid-fire casting
- 🎮 Gaming Ready: Direct keyboard integration for seamless game control
- 🔧 Optimized Performance: Pre-computed variations and early-exit logic
🎬 Demo
Click to watch the magic in action!
🚀 Quick Start
Prerequisites
- Python 3.8 or higher
- Microphone access
- AssemblyAI API key (free tier includes $50 credits)
Installation
-
Clone the…
Technical Implementation & AssemblyAI Integration
Core Architecture
The system leverages AssemblyAI's Universal-Streaming technology with aggressive optimization for minimal latency:
# Optimized streaming parameters for ultra-low latency client.connect( StreamingParameters( sample_rate=16000, format_turns=True, # Aggressive turn detection for faster response end_of_turn_confidence_threshold=0.5, # Lower threshold for faster detection min_end_of_turn_silence_when_confident=100, # Reduced from 160ms max_turn_silence=1500, # Reduced from 2400ms ) )
Real-Time Processing Innovation
The key innovation is dual-layer processing that handles both partial and complete transcripts:
def on_turn(self: Type[StreamingClient], event: TurnEvent): transcript = event.transcript is_partial = not event.end_of_turn if is_partial: # Process partial transcript for immediate response (min 4 characters) if len(transcript) >= 4: print(f"👂 Partial: {transcript}") if process_transcript(transcript, confidence_threshold=0.8, is_partial=True): print("✨ Spell cast from partial transcript!") else: # Process complete transcript with lower threshold print(f"🗣️ Complete: {transcript}") process_transcript(transcript, confidence_threshold=0.6, is_partial=False)
Intelligent Spell Matching
I implemented an optimized fuzzy matching system that prioritizes speed:
def optimized_fuzzy_match(text, spell_list, threshold=0.6): text = text.lower().strip() # First, try exact matches in pre-computed variations if text in SPELL_VARIATIONS: return SPELL_VARIATIONS[text] # Quick substring check for common patterns for spell in spell_list: if spell in text or text in spell: if len(text) >= len(spell) * 0.7: # At least 70% of spell length return spell # Fallback to SequenceMatcher only when needed # ... fuzzy matching logic
Performance Optimizations
- Pre-computed Spell Variations: Common spell variations are cached for instant lookup
- Spam Prevention: Prevents accidental rapid-fire casting with time-based cooldowns
- Early Exit Logic: Avoids expensive fuzzy matching when exact matches are found
- Partial Processing: Acts on partial transcripts for sub-300ms response times
AssemblyAI Features Utilized
- Universal-Streaming: Core real-time transcription with 300ms latency
- Turn Detection: Intelligent endpointing for natural speech flow
- High Accuracy: Handles complex fantasy terminology and pronunciation variations
- Partial Transcripts: Enables immediate response without waiting for complete utterances
Results
The system consistently achieves sub-300ms latency from speech input to game action, making spell casting feel truly magical and responsive. The combination of AssemblyAI's ultra-fast streaming with optimized processing creates an immersive gaming experience where voice commands feel as natural as pressing keys.
Perfect for Harry Potter games, VR experiences, or any application requiring instant voice command recognition! 🧙♂️✨
Top comments (30)
Incredible demo!
thaaaanks <3
happy to see so many positive comments
It was an unexpectedly, refreshing awesome HP project with a captivating demo, and coz i am a potterhead ig. Hence, i like it 😁
I played HP on Nintendo.
Had to buy game in Steam to showcase this tool I made :D
But I made Slytherin character on PC now. I heard you get avada kedavra sooner when playing Slytherin :DD
Slytherins are the coolest imo, followed closely by the Ravens.
didn't know) will try it out.
Do you follow the HP 2 release?
many say gonna be only in 2027 :((((
It has been lovely childhood memories for me. I saw the new cast, and also, it would be too late ig.
I won't be following it 😅
What will you try out btw?
the Ravens in Hogwarts Legacy game.
So far played with Gryf and Slyth.
But for every house they have different quests from what I know
Sounds really interesting.
Never played it myself though 😅, so I have no idea about it.
Man the Thumbnail looks DOPE!!!!
It feels crazy!!
Gotta agree here🔥. What tool was used to create it?
I mean, no actual tool. Just python and assemblyAi))
Oh no, I was referring to the thumbnail😅
oh sorry, I had like 4 hour sleep in the last 48 hours. haha.
Didn't pay enough attention.
I actually have a telegram bot that creates thumbnails. but in short the IP here is just good prompting for gpt-image model :)
Oh that's awesome✨✨. Nice one.
sorry man. I was replying when i had like 0 sleep in some 24h+
thought you were referring to project. haha.
But yeah, the thumbnail is also cool!
Awesome! 🔥🔥
I always love your video demos!
Thanks, friend! You would do a big favour to me if subscribe on my YouTube channel!
great demo
I can already imagine yelling, Stupefy! at my screen xD. Gotta try this. Is it open source?
Ahahah, I can't wait to Unlock Avada Kedavra really!!!
I love how you combined real-time voice recognition with game commands. Did you test it on any specific games or is it adaptable to multiple ones?
As you can see in Demo video it is for Harry potter game Hogwarts Legacy, but obviously commands can do whatever you like in any game.
This looks awesome✨. So, to execute the commands in the game, are you running the script concurrently while playing it? Or did you integrate it somehow with the game?
Thank you!) 🙏
Yes you just run it concurrently. Didn't complicate it with extra rules and code. Just run it when already in the game
Awesome👍
Some comments may only be visible to logged-in visitors. Sign in to view all comments.