Skip to content

DEV Community

Posted on Jul 23

Real-Time Voice Meets RAG: Building a Domain-Specific AI Chatbot

#devchallenge #assemblyaichallenge #ai #api

AssemblyAI Voice Agents Challenge: Domain Expert

This is a submission for the AssemblyAI Voice Agents Challenge

What I Built

Built a small side project recently: a voice-based chatbot that answers sociology questions using a domain-trained RAG agent. It’s called Sociopal.

It’s powered by LangGraph, does corrective RAG, and can also search the web when it doesn’t have the answer. AssemblyAI handles speech-to-text, and ElevenLabs takes care of the speech output.

You ask a sociology-related question using your voice. The app transcribes your voice to text, queries a backend agent trained on sociology docs, and gives a response. If the answer isn’t found in the vector DB, it falls back to web search and tries again.

The final response is both displayed and spoken aloud using ElevenLabs.

Demo

Not deployed yet, but here’s a short demo video:

GitHub Repository

⭐ Github 👇

k0msenapati / sociopal

Sociopal

A domain expert AI voice agent for sociology.

Learn more about Sociopal

Sociopal is a Corrective RAG (CRAG) agent powered by a vectorDB containing curated sociology information and web search. It is designed to answer questions and provide detailed explanations related to sociology.

Technology Stack

Frontend:

Next.js
AssemblyAI (speech-to-text)
ElevenLabs (text-to-speech)

Backend:

FastAPI
LangGraph
Groq
ChromaDB
DuckDuckGo (web search)

Getting Started

1. Clone the Repository

git clone https://github.com/k0msenapati/sociopal.git

2. Navigate to the Project Directory

cd sociopal

Frontend Setup

cd ui bun i cp .env.example .env.local

Fill in your ElevenLabs and AssemblyAI API keys in .env.local.

Start the development server:

bun dev

Backend Setup

cd ../agent-py uv sync source .venv/bin/activate cp .env.example .env

Fill in your Groq API key in .env.

Index the Data

uv run --active -m sociology_agent.index

Run the Server

uv run --active uvicorn sociology_agent.server:app --reload

Installation steps are included in the README.

Technical Implementation

AssemblyAI Integration

I’m using AssemblyAI’s Universal-Streaming API to handle real-time voice input. Here’s the rough flow:

1. Getting a Temporary Token

There's an API route (/api/token) that fetches a temporary token:

const url = `https://streaming.assemblyai.com/v3/token?expires_in_seconds=60`

2. Connecting via WebSocket

Once the token is ready, a WebSocket connection is opened to stream audio:

wss://streaming.assemblyai.com/v3/ws?sample_rate=16000&formatted_finals=true&token=${token}

On the frontend, I use getUserMedia() to access the mic, then convert the audio to 16-bit PCM and send it over the socket. AssemblyAI returns transcripts in real time, which I display as the user speaks.

It works smoothly with low latency, and transcripts are surprisingly accurate even with casual speech.

Backend Agent

The backend runs a FastAPI app with a /query route. It accepts user queries, passes them to the LangGraph agent, and returns the response.

The agent uses corrective RAG, so if the first answer is incomplete or irrelevant, it will retry with a refined query. It’s also hooked up to a web search tool in case the answer isn’t in the vectorDB.

Final Thoughts

Building this was a fun way to explore how voice can enhance AI agents. Using real-time transcription with AssemblyAI and natural-sounding speech with ElevenLabs made the voice interface smooth to implement.

While this one is trained on sociology data, the setup is actually domain-agnostic. You can swap out the vector database with any other domain-specific content, and the agent will still work just as well.

Definitely worth trying if you're into voice UIs or building smarter assistants.

Thanks for reading, and I look forward to connecting with you again soon!

Follow me for more content like this!

Twitter | GitHub | YouTube

Top comments (6)

Subscribe

Rohan Sharma • Jul 23

Great job, man!

Abhinav • Jul 23

Amazing 🫡

Ayush Jhawar • Jul 26

Great Work 🎉

Pheonix Coder 🐦‍🔥 • Jul 23

Voice addition is just 🫰

Tuhin Banerjee • Jul 23

cool stuff...

Some comments may only be visible to logged-in visitors. Sign in to view all comments.