
I have been experimenting with AI agents for a while now but this time, I wanted to build a Voice AI Agent. I won't lie, it does feel intimidating ...
For further actions, you may consider blocking this person and/or reporting abuse
Anmol Baranwal on July 12, 2025
For further actions, you may consider blocking this person and/or reporting abuse
Incredible as usual.
I gotta try this out soon.
Thank you
Appreciate you reading this Divya. You can make one for free with the credits you get. I'm checking out a few other platforms as well, may write about it soon :)
I am planning on it.
Thank you for this article.
Looking forward to those articles, if you write them .
This is amazing Anmol! Thanks for sharing π
thanks for reading Ndeye! If you end up creating an agent, you should definitely write about it.
Great job! This is super helpful! π
means a lot. thanks for reading!
Nice writeup. Breaking the stack into STT, NLU, dialog, and TTS is exactly how I got over the βvoice agents feel intimidatingβ hump too. Your examples map nicely to real workflows where voice makes more sense than Slack pings or forms.
A few things that helped me in production: set an explicit latency budget per turn (roughly STT partials <150 ms, model think time <300 ms, TTS <200 ms), and tune barge-in so the agent ducks or cuts TTS as soon as the user speaks. Voicemail detection matters for outbound - quick classification on the first 2-3 seconds saves a ton of wasted minutes. Also track turn-level metrics like first-token latency, interruption rate, and no-speech timeouts; they reveal most issues faster than raw WER. If you go WebRTC with LiveKit, VAD endpointing and audio normalization (AGC off, consistent gain) make a big difference when users are on PSTN bridges. For lead gen, guardrails around consent windows and DNC scrubbing are worth baking in early.
At Fluents we build voice agents across inbound and outbound, and BYOK on STT/TTS has been handy for accent-heavy markets or industry vocab - swapping Deepgram vs Whisper or different TTS voices per use case without changing the rest of the stack. We also found a simple slot-filling layer for critical fields (name, date, address) reduces loops and weird detours from the LLM.
Curious which STT/TTS combo you landed on with Retell, and how you handled barge-in and voicemail detection. Also, did you end up simulating noisy environments during testing or just iterating from real calls? Would love a follow-up post with the testing setup and metrics you track.
Nice
I am a beginner, without any prior knowledge can I complete this???
I didn't have much knowledge in this space which is why I wrote this.. so others can understand the fundamental concepts. After reading this, I'm sure you can do it easily. And if you want to build something more advanced, just refer to the docs.
π thanksss
Amazing, can it take info from call and push into my db ?
Yeah, you can use the Webhook node to collect the data and send it to your endpoint. Then parse the JSON payload & use any ORM to write data in your DB.
Can it be integrated with n8n?
I don't think there is a direct integration with n8n (as per the docs) but we might be able to do it indirectly using a webhook.
Please can you try and let me know if it's possible?
good articles, love airport video!
thanks. I'm so happy you noticed that :D
It was a shorts video so it wasn't embedding properly which is probably why most people missed it.
Love how clearly you broke everything down too...inspiring stuff
Feels great to hear that Parag! I spent a lot of time learning everything (since this was my first time too) and tried my best to explain the stuff.
NoiceeeπͺπΌπ€π»
thanks π
Love it! Have to try this at some point. Tried something similar 3 years ago, but the technology just wasn't there to make it useful.
Yeah, I was researching and found some crazy useful platforms out there. Some are a bit technical, others are easier so the barrier to entry in this space is dropping really fast.
A mind blowing model, have multiple application and use cases . I will try my idea to build a new one of my own. A game changer idea. Thanks a lot Atish
yeah the use cases are really cool and I'm also learning/trying new stuff.
This is amazing brother!! π
Appreciate you saying that Ali. I'm also looking into 11ai (recently launched by ElevenLabs).
Nice!!
Amazing Resource For my Current Project, Thanks Alot Dude...
go build something cool :)
Awesome π