This is a submission for the AssemblyAI Voice Agents Challenge for Business Automation Voice Agent and Domain Expert Voice Agent
📝 What I Built
As a software engineer at the healthcare startup CareSetu and a 3rd-year B.Tech student in Mathematics and Computing at institute of national importance, I've seen firsthand how technology can solve critical real-world problems. This voice-based web app allows users to schedule medical appointments, get answers to health questions like 'What precautions should I take for diabetes?', and manage their healthcare needs seamlessly. It’s designed to feel like you're having a conversation with a trusted health assistant, making healthcare more accessible for everyone.
From a business perspective, this directly impacts CareSetu an other business by automating front-desk tasks, reducing operational costs, and ensuring a steady flow of scheduled appointments, which is vital for the financial health of our partner clinics.
As you can see your result(appointment scheduled) which is 100% correct that means AssemblyAI STT conversion is most reliable for your other task also.
Tech Stack Used:
Backend Tech Stack
✅Core Framework & Runtime:
✅ Python 3.11.9 - Main backend language
✅ LiveKit Agents Framework- Real-time voice/video communication platform
✅AsyncIO - Asynchronous programming for handling concurrent operations
AI & Machine Learning:
✅Gemini flash - LLM integration for conversational AI
✅Cartesia/TTS - text-to-speech services
✅AssemblyAI - STT service with business optimizations
✅ElevenLabs - Premium text-to-speech service but as fallback
✅Google Cloud Speech - Additional TTS provider as fallback
✅Transformers/HuggingFace - ML model handling
PDFMiner/PDFPlumber/PyPDF2 - PDF document processing
✅NumPy/SciPy - Scientific computing
✅Scikit-learn - Machine learning utilities
Web Framework & APIs:
✅LiveKit Agents Framework - Real-time communication platform
✅ Python HTTP Server - Simple token server for frontend integration
✅ AIOHTTP - HTTP client library (for outbound requests
Integrations:
✅Google Calendar API - Appointment scheduling
✅Google Cloud APIs - Various Google services
Frontend Tech Stack
Core Framework:
✅React 19.1.0 - Modern React with latest features
✅Vite 7.0.4 - Fast build tool and dev server
✅TypeScript - Type-safe JavaScript development
✅UI & Styling:
✅Tailwind CSS 4.1.11- Utility-first CSS framework
✅PostCSS- CSS processing
✅Real-time Communication:
✅LiveKit Client - WebRTC client for voice/video
✅@livekit/components-react - Pre-built React components for LiveKit
Testing:
✅Vitest - Fast unit testing framework
✅Testing Library - React component testing utilities
✅JSdom- DOM simulation for testing
Development Tools:
✅ESLint - Code linting
✅ Terser - JavaScript minification
🔍 STEP-BY-STEP DETAILED BREAKDOWN
User Voice → Microphone → Web Audio API → LiveKit Stream
Audio Stream → AssemblyAI → Text Transcript
3.Text Query → Query Processing → Knowledge Search → Context Building
4.Enhanced Context → Google Gemini → AI Response
5.Appointment Intent → Google Calendar API → Booking Result
6.AI Response → Cartesia/ElevenLabs/Google → Audio Stream
7.Audio Stream → Web Audio API → Speaker Output
8.Complete Interaction → Analysis → Knowledge Update
9.User continues → Loop to Step 1 | Timeout/Disconnect → End session
10.STT Error → Show error → Retry → Text input fallback
LLM Error → Show error → RAG-only response → Retry
TTS Error → Try next service → Text response fallback
Calendar Error → Show error → Manual booking → Retry
Note: This model currently supports the Appointment Intent and Query Intent (such as providing information based on FAQs, the Privacy Policy of CareSetu, health insurance details, various departments of CareSetu, and general modern scientific tips along with homemade remedies related to healthcare).
💻Demo
Explanation Video
👉About Myself, Working Project and Repository explanation
Note:- As you can see at timestamp 7:07 model tell me my name this mean it remembered my name during conversation.
👉Pure Backend Explanation
👉Pure Frontend Explanation
The application is live at:
👉Live Link
👉Backend is hosted on an AWS EC2 instance with Nginx as a reverse proxy.
👉Frontend is hosted on Vercel.
📁 GitHub Repository
Proof of Code Snippet and its result
Source:- caresetuAgent_3.0(Backend)
Code Snippet for Building a Voice Agent with AssemblyAI and LiveKit
Code Snippet for RAG Integration
*Calendar Integration *
🧘🏻♂️Conclusion
AssemblyAI played a crucial role in helping me successfully complete this challenge. From the start of the CareSetu agent project, the AssemblyAI team provided responsive support and guidance, answering my questions about technical requirements, deployment options, and permissible ways to share my project publicly. Whether it was clarifying best practices for publishing my work, assisting with integration details, or offering encouragement during each milestone, their team was always available whenever I needed help, as evidenced by the direct conversations with team members like Lee Vaughn , Dan Ince , Amanda DiNoto and Ryan Seams. Their willingness to address any issues and interest in seeing my progress not only boosted my confidence but also ensured technical obstacles never became roadblocks. This support allowed me to focus fully on building an impactful, reliable voice agent for healthcare automation and customer support—demonstrating AssemblyAI’s genuine commitment to the success of developers using their platform.
Comment your thoughts, and follow me!
🔗 Connect with Me
Medium:- Profile Link
Twitter/X: Profile Link
LinkedIn: Profile Link
Top comments (26)
Bro this is genuinely impressive! Loved how you’ve combined instant healthcare access with AI-powered support—feels like something the system really needs. Proud of you for building something this impactful 👏 Keep going!"
Thanks @alok_maurya_dc6a114d6187c
Saket, this is fantastic!
A huge step forward for healthcare accessibility.
Thanks @praneshsharma . Ready to collaborate with you in upcoming @AssemblyAI or any AI/ML project.
Yes , let's catch up soon and build something cool together !!
An impressive blend of RAG and LLM .
Thanks @sudhanshu .
Impressed by your effort bro .keep growing @saket_jha_a89aca5daba5e8c .
Thanks @ayush_kumar_6b9069a2dfaa1
The use of AI in the most crucial area that is health is amazing. Also the feature of scheduling appointment is very well executed.
Yes, it's working. It will reduce manpower in the appointment scheduling department and will be available 24/7.
This is an impressive integration of AI into healthcare! The voice-enabled features and instant appointment system could be a game-changer for accessibility.
Thanks @shaunak_lende_81d210ea9d9 .
Absolutely impressed by this unique and innovative project! 🎯
Huge thanks to @saket Kumar Jha for introducing me to @AssemblyAI — such a powerful tool!
I’m excited to share it with my friends and can’t wait to use it in my upcoming BTP project. 🚀🔍
Thanks @rakesh_mishra_d7250d028e4. You should definitely use in your project.
Kudos, Saket! Integrating voice commands into a health platform is a forward- thinking idea. You're not just building an app- you're already making it happen.
Realy it will be a Game- changer. Well done!!
Great work, Saket! As someone working with NLP, I really appreciate the thoughtful integration of RAG, voice AI, and real-time systems. Clean execution with real-world impact, truly impressive!
Thanks @eishaan_khatri.
Great work with the use of AI in healthcare field. This has potential for the game changer in health care.
Thanks @anijeet_mani_c0029251ff5e