This is a submission for the AssemblyAI Voice Agents Challenge
What I Built
Built a small side project recently: a voice-based chatbot that answers sociology questions using a domain-trained RAG agent. It’s called Sociopal.
It’s powered by LangGraph, does corrective RAG, and can also search the web when it doesn’t have the answer. AssemblyAI handles speech-to-text, and ElevenLabs takes care of the speech output.
You ask a sociology-related question using your voice. The app transcribes your voice to text, queries a backend agent trained on sociology docs, and gives a response. If the answer isn’t found in the vector DB, it falls back to web search and tries again.
The final response is both displayed and spoken aloud using ElevenLabs.
Demo
Not deployed yet, but here’s a short demo video:
GitHub Repository
⭐ Github 👇
Sociopal
A domain expert AI voice agent for sociology.
Learn more about Sociopal
Sociopal is a Corrective RAG (CRAG) agent powered by a vectorDB containing curated sociology information and web search. It is designed to answer questions and provide detailed explanations related to sociology.
Technology Stack
Frontend:
- Next.js
- AssemblyAI (speech-to-text)
- ElevenLabs (text-to-speech)
Backend:
- FastAPI
- LangGraph
- Groq
- ChromaDB
- DuckDuckGo (web search)
Getting Started
1. Clone the Repository
git clone https://github.com/k0msenapati/sociopal.git
2. Navigate to the Project Directory
cd sociopal
Frontend Setup
cd ui bun i cp .env.example .env.local
Fill in your ElevenLabs and AssemblyAI API keys in .env.local
.
Start the development server:
bun dev
Backend Setup
cd ../agent-py uv sync source .venv/bin/activate cp .env.example .env
Fill in your Groq API key in .env
.
Index the Data
uv run --active -m sociology_agent.index
Run the Server
uv run --active uvicorn sociology_agent.server:app --reload
Installation steps are included in the README.
Technical Implementation
AssemblyAI Integration
I’m using AssemblyAI’s Universal-Streaming API to handle real-time voice input. Here’s the rough flow:
1. Getting a Temporary Token
There's an API route (/api/token) that fetches a temporary token:
const url = `https://streaming.assemblyai.com/v3/token?expires_in_seconds=60`
2. Connecting via WebSocket
Once the token is ready, a WebSocket connection is opened to stream audio:
wss://streaming.assemblyai.com/v3/ws?sample_rate=16000&formatted_finals=true&token=${token}
On the frontend, I use getUserMedia()
to access the mic, then convert the audio to 16-bit PCM and send it over the socket. AssemblyAI returns transcripts in real time, which I display as the user speaks.
It works smoothly with low latency, and transcripts are surprisingly accurate even with casual speech.
Backend Agent
The backend runs a FastAPI app with a /query
route. It accepts user queries, passes them to the LangGraph agent, and returns the response.
The agent uses corrective RAG, so if the first answer is incomplete or irrelevant, it will retry with a refined query. It’s also hooked up to a web search tool in case the answer isn’t in the vectorDB.
Final Thoughts
Building this was a fun way to explore how voice can enhance AI agents. Using real-time transcription with AssemblyAI and natural-sounding speech with ElevenLabs made the voice interface smooth to implement.
While this one is trained on sociology data, the setup is actually domain-agnostic. You can swap out the vector database with any other domain-specific content, and the agent will still work just as well.
Definitely worth trying if you're into voice UIs or building smarter assistants.
Thanks for reading, and I look forward to connecting with you again soon!
Follow me for more content like this!
Top comments (6)
Great job, man!
Amazing 🫡
Great Work 🎉
Voice addition is just 🫰
cool stuff...
Some comments may only be visible to logged-in visitors. Sign in to view all comments.