DEV Community

Cover image for I Wanted to Learn Faster — So I Built a Voice AI Tutor with GPT-4 in a Weekend
Shola Jegede
Shola Jegede Subscriber

Posted on

I Wanted to Learn Faster — So I Built a Voice AI Tutor with GPT-4 in a Weekend

A few months ago, I noticed something frustrating.

I’d spend hours trying to learn new concepts — watching videos, reading articles, or chatting with ChatGPT. But it still felt slow. Clunky. Passive. Typing questions felt like homework. Scrolling for the “right” explanation was exhausting.

So I asked myself:

What if learning felt more like a conversation?

That question turned into Learnflow AI — a voice-powered learning assistant you can talk to, like a personal tutor on demand.

In this series, I’ll show you exactly how I built it — from zero to a real-time, voice-enabled GPT-4 app using Vapi, Next.js, and OpenAI.

What is Learnflow AI?

Learnflow AI is a voice-first learning interface — think ChatGPT, but you don’t type. You talk, and the AI talks back in real time.

It uses Vapi.ai for streaming voice interaction and GPT-4 for intelligent answers. This combo creates an incredibly natural tutoring experience — no UI clutter, just press a button and speak.

You can use this same stack to build:

  • AI tutors that speak and listen
  • Voice companions and productivity bots
  • Assistants for hands-free learning

What We’re Building in This Part

Goal: A production-grade MVP that lets you speak to GPT-4 and get real-time spoken answers.

Here’s what’s included in Part 1:

  • Voice assistant built with Vapi.ai
  • GPT-4 for reasoning
  • Next.js frontend using the App Router
  • Tailwind, Radix, and Shadcn for styling and components
  • No user auth, database, memory, or credits (yet — that’s Part 2)

Why Voice-First?

Typing is slow. Scrolling is overwhelming. Voice changes everything.

When you ask questions out loud:

  • You process faster (no need to structure typed prompts)
  • It feels more natural and intuitive
  • It mimics real learning conversations with a tutor

Talking feels like learning — typing feels like searching.

My Tech Stack

Layer Tech Why It Was Chosen
Voice Interface Vapi.ai Real-time audio streaming + OpenAI-ready
LLM Provider OpenAI GPT-4 High-quality answers, fast inference
Frontend Next.js (App Router) Scalable file-based routing
Styling Tailwind CSS Fast to iterate, responsive
Components Radix UI + Shadcn Accessible, low-level UI primitives
Language TypeScript DX + type safety
Hosting Vercel Instant deploys for Next.js

File Structure (Voice MVP)

This part of the app is intentionally simple — it’s purpose is to quickly show the voice assistant in action.

learnflow-ai/ ├── app/ └── globals.css └── layout.tsx └── page.tsx ├── constants/ └── soundwaves.json └── lib/ └── utils.ts └── vapi.sdk.ts 
Enter fullscreen mode Exit fullscreen mode

We focus purely on getting the voice flow working before layering on state, auth, db, or personalization.

Step-by-Step: Setting Up the Voice Assistant

This part assumes you have already setup your next.js app router codebase + installing shadcn. You can follow the steps here to do that:

Step 1: Setup your App layout

import type { Metadata } from "next"; import { Bricolage_Grotesque } from "next/font/google"; import "./globals.css"; const bricolage = Bricolage_Grotesque({ variable: "--font-bricolage", subsets: ["latin"], }); export const metadata: Metadata = { title: "Learnflow AI", description: "A voice-only learning platform for developers", }; export default function RootLayout({ children, }: Readonly<{ children: React.ReactNode; }>) { return ( <html lang="en"> <body className={`${bricolage.variable} antialiased`}> {children} </body> </html> ); } 
Enter fullscreen mode Exit fullscreen mode

Step 2: Create a constants folder and a soundwaves.json file and then paste this JSON (constants/soundwaves.json)

{"nm":"Render","ddd":0,"h":250,"w":250,"meta":{"g":"LottieFiles AE 3.1.1"},"layers":[{"ty":4,"nm":"Arrow Outlines 4","sr":1,"st":0,"op":300.00001221925,"ip":0,"hd":false,"ddd":0,"bm":0,"hasMask":false,"ao":0,"ks":{"a":{"a":0,"k":[13,15.5,0],"ix":1},"s":{"a":0,"k":[170,170,100],"ix":6},"sk":{"a":0,"k":0},"p":{"a":0,"k":[100.5,126,0],"ix":2},"r":{"a":0,"k":0,"ix":10},"sa":{"a":0,"k":0},"o":{"a":0,"k":100,"ix":11}},"ef":[],"shapes":[{"ty":"sh","bm":0,"hd":false,"mn":"ADBE Vector Shape - Group","nm":"Path 1","ix":1,"d":1,"ks":{"a":1,"k":[{"o":{"x":0.333,"y":0},"i":{"x":0.667,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,13.618],[12.471,14.676]]}],"t":0},{"o":{"x":0.333,"y":0},"i":{"x":0.667,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,8.103],[12.471,18.868]]}],"t":30},{"o":{"x":0.333,"y":0},"i":{"x":0.667,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,8.324],[12.471,19.235]]}],"t":45},{"o":{"x":0.333,"y":0},"i":{"x":0.667,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.515,4.206],[12.515,25.853]]}],"t":70},{"o":{"x":0.333,"y":0},"i":{"x":0.667,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,8.985],[12.471,17.912]]}],"t":83},{"o":{"x":0.333,"y":0},"i":{"x":0.667,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.515,5.309],[12.544,24.088]]}],"t":97},{"o":{"x":0.333,"y":0},"i":{"x":1,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.456,7.588],[12.456,20.044]]}],"t":109},{"o":{"x":0.333,"y":0},"i":{"x":1,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.456,6.265],[12.456,21]]}],"t":121},{"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,13.618],[12.471,14.676]]}],"t":128.000005213547}],"ix":2}},{"ty":"st","bm":0,"hd":false,"mn":"ADBE Vector Graphic - Stroke","nm":"Stroke 1","lc":2,"lj":1,"ml":4,"o":{"a":0,"k":100,"ix":4},"w":{"a":0,"k":4,"ix":5},"c":{"a":0,"k":[0.9922,0.949,0.9922,1],"ix":3}},{"ty":"sh","bm":0,"hd":false,"mn":"ADBE Vector Shape - Group","nm":"Path 2","ix":3,"d":1,"ks":{"a":0,"k":{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,11.118],[12.441,16.735]]},"ix":2}}],"ind":1},{"ty":4,"nm":"Arrow Outlines 3","sr":1,"st":0,"op":300.00001221925,"ip":0,"hd":false,"ddd":0,"bm":0,"hasMask":false,"ao":0,"ks":{"a":{"a":0,"k":[13,15.5,0],"ix":1},"s":{"a":0,"k":[170,170,100],"ix":6},"sk":{"a":0,"k":0},"p":{"a":0,"k":[146.75,126,0],"ix":2},"r":{"a":0,"k":0,"ix":10},"sa":{"a":0,"k":0},"o":{"a":0,"k":100,"ix":11}},"ef":[],"shapes":[{"ty":"sh","bm":0,"hd":false,"mn":"ADBE Vector Shape - Group","nm":"Path 1","ix":1,"d":1,"ks":{"a":1,"k":[{"o":{"x":0.973,"y":0},"i":{"x":0.581,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,13.618],[12.471,14.676]]}],"t":0},{"o":{"x":0.973,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,13.765],[12.441,18.353]]}],"t":30},{"o":{"x":0.973,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,13.618],[12.471,13.721]]}],"t":45},{"o":{"x":0.167,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.529,7.074],[12.5,15.191]]}],"t":70},{"o":{"x":0.973,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.456,5.529],[12.441,16.735]]}],"t":83},{"o":{"x":0.167,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.544,7.147],[12.515,15.044]]}],"t":97},{"o":{"x":0.973,"y":0},"i":{"x":0.592,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.456,3.103],[12.471,21.809]]}],"t":109},{"o":{"x":0.973,"y":0},"i":{"x":0.893,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,11.118],[12.441,16.735]]}],"t":122},{"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,13.618],[12.471,14.676]]}],"t":129.000005254278}],"ix":2}},{"ty":"st","bm":0,"hd":false,"mn":"ADBE Vector Graphic - Stroke","nm":"Stroke 1","lc":2,"lj":1,"ml":4,"o":{"a":0,"k":100,"ix":4},"w":{"a":0,"k":4,"ix":5},"c":{"a":0,"k":[0.9922,0.949,0.9922,1],"ix":3}},{"ty":"sh","bm":0,"hd":false,"mn":"ADBE Vector Shape - Group","nm":"Path 2","ix":3,"d":1,"ks":{"a":0,"k":{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,11.118],[12.441,16.735]]},"ix":2}}],"ind":2},{"ty":4,"nm":"Arrow Outlines 2","sr":1,"st":0,"op":300.00001221925,"ip":0,"hd":false,"ddd":0,"bm":0,"hasMask":false,"ao":0,"ks":{"a":{"a":0,"k":[13,15.5,0],"ix":1},"s":{"a":0,"k":[170,170,100],"ix":6},"sk":{"a":0,"k":0},"p":{"a":0,"k":[116.75,126,0],"ix":2},"r":{"a":0,"k":0,"ix":10},"sa":{"a":0,"k":0},"o":{"a":0,"k":100,"ix":11}},"ef":[],"shapes":[{"ty":"sh","bm":0,"hd":false,"mn":"ADBE Vector Shape - Group","nm":"Path 1","ix":1,"d":1,"ks":{"a":1,"k":[{"o":{"x":0.333,"y":0},"i":{"x":0,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,13.618],[12.471,14.676]]}],"t":0},{"o":{"x":0.333,"y":0},"i":{"x":0,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,11.118],[12.441,16.735]]}],"t":24},{"o":{"x":0.973,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,4.647],[12.441,25.706]]}],"t":41},{"o":{"x":0.973,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,11.118],[12.441,16.735]]}],"t":55},{"o":{"x":0.167,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,6.118],[12.471,20.412]]}],"t":70},{"o":{"x":0.973,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.456,1.926],[12.456,22.838]]}],"t":87},{"o":{"x":0.973,"y":0},"i":{"x":0,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,7.735],[12.471,20.265]]}],"t":101},{"o":{"x":0.333,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,11.118],[12.441,16.735]]}],"t":115},{"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,13.618],[12.471,14.676]]}],"t":134.000005457932}],"ix":2}},{"ty":"st","bm":0,"hd":false,"mn":"ADBE Vector Graphic - Stroke","nm":"Stroke 1","lc":2,"lj":1,"ml":4,"o":{"a":0,"k":100,"ix":4},"w":{"a":0,"k":4,"ix":5},"c":{"a":0,"k":[0.9922,0.949,0.9922,1],"ix":3}},{"ty":"sh","bm":0,"hd":false,"mn":"ADBE Vector Shape - Group","nm":"Path 2","ix":3,"d":1,"ks":{"a":0,"k":{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,11.118],[12.441,16.735]]},"ix":2}}],"ind":3},{"ty":4,"nm":"Arrow Outlines","sr":1,"st":0,"op":300.00001221925,"ip":0,"hd":false,"ddd":0,"bm":0,"hasMask":false,"ao":0,"ks":{"a":{"a":0,"k":[13,15.5,0],"ix":1},"s":{"a":0,"k":[170,170,100],"ix":6},"sk":{"a":0,"k":0},"p":{"a":0,"k":[131.75,126,0],"ix":2},"r":{"a":0,"k":0,"ix":10},"sa":{"a":0,"k":0},"o":{"a":0,"k":100,"ix":11}},"ef":[],"shapes":[{"ty":"sh","bm":0,"hd":false,"mn":"ADBE Vector Shape - Group","nm":"Path 1","ix":1,"d":1,"ks":{"a":1,"k":[{"o":{"x":0.973,"y":0},"i":{"x":0.581,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,13.618],[12.471,14.676]]}],"t":0},{"o":{"x":0.973,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.5,2],[12.5,28.5]]}],"t":30},{"o":{"x":0.973,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,11.118],[12.441,16.735]]}],"t":45},{"o":{"x":0.167,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.5,2],[12.5,28.5]]}],"t":70},{"o":{"x":0.973,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,11.118],[12.441,16.735]]}],"t":83},{"o":{"x":0.167,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.5,2],[12.5,28.5]]}],"t":97},{"o":{"x":0.973,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,5.75],[12.471,21.809]]}],"t":109},{"o":{"x":0.973,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,11.118],[12.441,16.735]]}],"t":125},{"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,13.618],[12.471,14.676]]}],"t":132.00000537647}],"ix":2}},{"ty":"st","bm":0,"hd":false,"mn":"ADBE Vector Graphic - Stroke","nm":"Stroke 1","lc":2,"lj":1,"ml":4,"o":{"a":0,"k":100,"ix":4},"w":{"a":0,"k":4,"ix":5},"c":{"a":0,"k":[0.9922,0.949,0.9922,1],"ix":3}},{"ty":"sh","bm":0,"hd":false,"mn":"ADBE Vector Shape - Group","nm":"Path 2","ix":3,"d":1,"ks":{"a":0,"k":{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,11.118],[12.441,16.735]]},"ix":2}}],"ind":4},{"ty":4,"nm":"cir 1","sr":1,"st":0,"op":300.00001221925,"ip":0,"hd":false,"ddd":0,"bm":0,"hasMask":false,"ao":0,"ks":{"a":{"a":0,"k":[0,0,0],"ix":1},"s":{"a":1,"k":[{"o":{"x":0.775,"y":0},"i":{"x":0.206,"y":1},"s":[111.8,111.8,100],"t":0},{"o":{"x":0.665,"y":0},"i":{"x":0.274,"y":1},"s":[120,120,100],"t":30},{"o":{"x":0.167,"y":0},"i":{"x":0.206,"y":1},"s":[111.8,111.8,100],"t":60},{"o":{"x":0.665,"y":0},"i":{"x":0.274,"y":1},"s":[120,120,100],"t":71.289},{"o":{"x":0.167,"y":0},"i":{"x":0.206,"y":1},"s":[111.8,111.8,100],"t":82.576},{"o":{"x":0.665,"y":0},"i":{"x":0.274,"y":1},"s":[120,120,100],"t":97.628},{"o":{"x":0.167,"y":0},"i":{"x":0.667,"y":1},"s":[111.8,111.8,100],"t":107.661},{"s":[111.8,111.8,100],"t":134.000005457932}],"ix":6},"sk":{"a":0,"k":0},"p":{"a":0,"k":[126.539,126.166,0],"ix":2},"r":{"a":0,"k":0,"ix":10},"sa":{"a":0,"k":0},"o":{"a":0,"k":100,"ix":11}},"ef":[],"shapes":[{"ty":"gr","bm":0,"hd":false,"mn":"ADBE Vector Group","nm":"Ellipse 1","ix":1,"cix":2,"np":3,"it":[{"ty":"el","bm":0,"hd":false,"mn":"ADBE Vector Shape - Ellipse","nm":"Ellipse Path 1","d":1,"p":{"a":0,"k":[0,0],"ix":3},"s":{"a":0,"k":[99.367,99.367],"ix":2}},{"ty":"fl","bm":0,"hd":false,"mn":"ADBE Vector Graphic - Fill","nm":"Fill 1","c":{"a":0,"k":[0.1255,0.1529,0.1725,1],"ix":4},"r":1,"o":{"a":0,"k":100,"ix":5}},{"ty":"tr","a":{"a":0,"k":[0,0],"ix":1},"s":{"a":0,"k":[100,100],"ix":3},"sk":{"a":0,"k":0,"ix":4},"p":{"a":0,"k":[-1.539,-1.166],"ix":2},"r":{"a":0,"k":0,"ix":6},"sa":{"a":0,"k":0,"ix":5},"o":{"a":0,"k":100,"ix":7}}]}],"ind":5},{"ty":4,"nm":"cir 2","sr":1,"st":0,"op":300.00001221925,"ip":0,"hd":false,"ddd":0,"bm":0,"hasMask":false,"ao":0,"ks":{"a":{"a":0,"k":[0,0,0],"ix":1},"s":{"a":1,"k":[{"o":{"x":0.775,"y":0},"i":{"x":0.206,"y":1},"s":[110,110,100],"t":0},{"o":{"x":0.665,"y":0},"i":{"x":0.274,"y":1},"s":[194.775,194.775,100],"t":34},{"o":{"x":0.167,"y":0},"i":{"x":0.206,"y":1},"s":[150,150,100],"t":60},{"o":{"x":0.665,"y":0},"i":{"x":0.274,"y":1},"s":[194.775,194.775,100],"t":71.289},{"o":{"x":0.167,"y":0},"i":{"x":0.206,"y":1},"s":[150,150,100],"t":82.576},{"o":{"x":0.665,"y":0},"i":{"x":0.274,"y":1},"s":[194.775,194.775,100],"t":97.628},{"o":{"x":0.167,"y":0},"i":{"x":0.274,"y":1},"s":[150,150,100],"t":117},{"s":[110,110,100],"t":123.966255049249}],"ix":6},"sk":{"a":0,"k":0},"p":{"a":0,"k":[126.539,126.166,0],"ix":2},"r":{"a":0,"k":0,"ix":10},"sa":{"a":0,"k":0},"o":{"a":0,"k":100,"ix":11}},"ef":[],"shapes":[{"ty":"gr","bm":0,"hd":false,"mn":"ADBE Vector Group","nm":"Ellipse 1","ix":1,"cix":2,"np":3,"it":[{"ty":"el","bm":0,"hd":false,"mn":"ADBE Vector Shape - Ellipse","nm":"Ellipse Path 1","d":1,"p":{"a":0,"k":[0,0],"ix":3},"s":{"a":0,"k":[99.367,99.367],"ix":2}},{"ty":"fl","bm":0,"hd":false,"mn":"ADBE Vector Graphic - Fill","nm":"Fill 1","c":{"a":0,"k":[0.1255,0.1529,0.1725,1],"ix":4},"r":1,"o":{"a":0,"k":10,"ix":5}},{"ty":"tr","a":{"a":0,"k":[0,0],"ix":1},"s":{"a":0,"k":[100,100],"ix":3},"sk":{"a":0,"k":0,"ix":4},"p":{"a":0,"k":[-1.539,-1.166],"ix":2},"r":{"a":0,"k":0,"ix":6},"sa":{"a":0,"k":0,"ix":5},"o":{"a":0,"k":100,"ix":7}}]}],"ind":6},{"ty":4,"nm":"cir 3","sr":1,"st":0,"op":300.00001221925,"ip":0,"hd":false,"ddd":0,"bm":0,"hasMask":false,"ao":0,"ks":{"a":{"a":0,"k":[0,0,0],"ix":1},"s":{"a":1,"k":[{"o":{"x":0.775,"y":0},"i":{"x":0.206,"y":1},"s":[110,110,100],"t":0},{"o":{"x":0.665,"y":0},"i":{"x":0.274,"y":1},"s":[230,230,100],"t":34},{"o":{"x":0.167,"y":0},"i":{"x":0.206,"y":1},"s":[190,190,100],"t":60},{"o":{"x":0.665,"y":0},"i":{"x":0.274,"y":1},"s":[230,230,100],"t":71.289},{"o":{"x":0.167,"y":0},"i":{"x":0.206,"y":1},"s":[190,190,100],"t":82.576},{"o":{"x":0.665,"y":0},"i":{"x":0.274,"y":1},"s":[230,230,100],"t":97.628},{"o":{"x":0.167,"y":0},"i":{"x":0.274,"y":1},"s":[190,190,100],"t":118},{"s":[110,110,100],"t":134.000005457932}],"ix":6},"sk":{"a":0,"k":0},"p":{"a":0,"k":[126.539,126.166,0],"ix":2},"r":{"a":0,"k":0,"ix":10},"sa":{"a":0,"k":0},"o":{"a":0,"k":100,"ix":11}},"ef":[],"shapes":[{"ty":"gr","bm":0,"hd":false,"mn":"ADBE Vector Group","nm":"Ellipse 1","ix":1,"cix":2,"np":3,"it":[{"ty":"el","bm":0,"hd":false,"mn":"ADBE Vector Shape - Ellipse","nm":"Ellipse Path 1","d":1,"p":{"a":0,"k":[0,0],"ix":3},"s":{"a":0,"k":[99.367,99.367],"ix":2}},{"ty":"fl","bm":0,"hd":false,"mn":"ADBE Vector Graphic - Fill","nm":"Fill 1","c":{"a":0,"k":[0.1255,0.1529,0.1725,1],"ix":4},"r":1,"o":{"a":0,"k":5,"ix":5}},{"ty":"tr","a":{"a":0,"k":[0,0],"ix":1},"s":{"a":0,"k":[100,100],"ix":3},"sk":{"a":0,"k":0,"ix":4},"p":{"a":0,"k":[-1.539,-1.166],"ix":2},"r":{"a":0,"k":0,"ix":6},"sa":{"a":0,"k":0,"ix":5},"o":{"a":0,"k":100,"ix":7}}]}],"ind":7}],"v":"4.8.0","fr":29.9700012207031,"op":135.000005498663,"ip":0,"assets":[]} 
Enter fullscreen mode Exit fullscreen mode

Step 3: Create another file in your constants folder (constants/index.ts) and paste this code

export const subjects = [ "javascript", "python", "html", "css", "algorithms", "databases", ]; export const subjectsColors = { javascript: "#FFD166", python: "#9BE7FF", html: "#FF9AA2", css: "#B5EAD7", algorithms: "#CBAACB", databases: "#FFDAC1", }; export const voices = { male: { casual: "2BJW5coyhAzSr8STdHbE", formal: "c6SfcYrb2t09NHXiT80T" }, female: { casual: "ZIlrSGI4jZqobxRKprJz", formal: "sarah" }, }; export const recentSessions = [ { id: "1", subject: "javascript", name: "Codey the JS Debugger", topic: "Understanding Closures", duration: 40, color: "#FFD166", }, { id: "2", subject: "python", name: "Snakey the Python Guru", topic: "List Comprehensions & Lambdas", duration: 35, color: "#9BE7FF", }, { id: "3", subject: "html", name: "Structo the Markup Architect", topic: "Semantic Tags & Accessibility", duration: 25, color: "#FF9AA2", }, { id: "4", subject: "css", name: "Stylo the Flexbox Wizard", topic: "Flexbox vs Grid Layouts", duration: 30, color: "#B5EAD7", }, { id: "5", subject: "algorithms", name: "Algo the Problem Solver", topic: "Binary Search Explained Visually", duration: 45, color: "#CBAACB", }, { id: "6", subject: "databases", name: "Query the Data Whisperer", topic: "SQL Joins: Inner vs Outer", duration: 20, color: "#FFDAC1", }, ]; 
Enter fullscreen mode Exit fullscreen mode

Step 4: Install lottie-react package

npm install lottie-react 
Enter fullscreen mode Exit fullscreen mode

Step 5: Install the Vapi SDK

npm install @vapi-ai/web 
Enter fullscreen mode Exit fullscreen mode

You’ll need a free Vapi account to get your API key.

Step 6: Initialize the Vapi Client (lib/vapi.sdk.ts)

This sets up the Vapi SDK with your API key, allowing your app to connect to Vapi’s voice infrastructure:

import Vapi from "@vapi-ai/web"; export const vapi = new Vapi(process.env.NEXT_PUBLIC_VAPI_WEB_TOKEN!); 
Enter fullscreen mode Exit fullscreen mode

It initializes the core Vapi client that handles real-time audio, streaming, and connection to your AI assistant. Every voice interaction starts here.

Step 7: Create a (lib/utils.ts) file and paste these codes

import { clsx, type ClassValue } from "clsx"; import { twMerge } from "tailwind-merge"; import { subjectsColors, voices } from "@/constants"; import { CreateAssistantDTO } from "@vapi-ai/web/dist/api"; export function cn(...inputs: ClassValue[]) { return twMerge(clsx(inputs)); } export const getSubjectColor = (subject: string) => { return subjectsColors[subject as keyof typeof subjectsColors]; }; export const configureAssistant = (voice: string, style: string) => { const voiceId = voices[voice as keyof typeof voices][ style as keyof (typeof voices)[keyof typeof voices] ] || "sarah"; const vapiAssistant: CreateAssistantDTO = { name: "Companion", firstMessage: "Hello, let's start the session. Today we'll be talking about {{topic}}.", transcriber: { provider: "deepgram", model: "nova-3", language: "en", }, voice: { provider: "11labs", voiceId: voiceId, stability: 0.4, similarityBoost: 0.8, speed: 1, style: 0.5, useSpeakerBoost: true, }, model: { provider: "openai", model: "gpt-4", messages: [ { role: "system", content: `You are a highly knowledgeable tutor teaching a real-time voice session with a student. Your goal is to teach the student about the topic and subject. Tutor Guidelines: Stick to the given topic - {{ topic }} and subject - {{ subject }} and teach the student about it. Keep the conversation flowing smoothly while maintaining control. From time to time make sure that the student is following you and understands you. Break down the topic into smaller parts and teach the student one part at a time. Keep your style of conversation {{ style }}. Keep your responses short, like in a real voice conversation. Do not include any special characters in your responses - this is a voice conversation. `, }, ], }, clientMessages: [], serverMessages: [], }; return vapiAssistant; }; 
Enter fullscreen mode Exit fullscreen mode

Step 8: Create your assistant

Simply paste this code into your (app/page.tsx) file

'use client'; import {useEffect, useRef, useState} from 'react' import {cn, configureAssistant, getSubjectColor} from "@/lib/utils"; import {vapi} from "@/lib/vapi.sdk"; import Image from "next/image"; import Lottie, {LottieRefCurrentProps} from "lottie-react"; import soundwaves from '@/constants/soundwaves.json' import { useMutation } from "convex/react"; import { api } from '@/convex/_generated/api'; import { Id } from "@/convex/_generated/dataModel"; enum CallStatus { INACTIVE = 'INACTIVE', CONNECTING = 'CONNECTING', ACTIVE = 'ACTIVE', FINISHED = 'FINISHED', } const Page = () => { //Demo details const subject = "javascript" const topic = "React and Typescript" const name = "Better Call Saul" const style = "casual" const voice = "male" const userName = "Shola - student" const userImage = "images/me.png" const [callStatus, setCallStatus] = useState<CallStatus>(CallStatus.INACTIVE); const [isSpeaking, setIsSpeaking] = useState(false); const [isMuted, setIsMuted] = useState(false); const [messages, setMessages] = useState<SavedMessage[]>([]); const lottieRef = useRef<LottieRefCurrentProps>(null); useEffect(() => { if(lottieRef) { if(isSpeaking) { lottieRef.current?.play() } else { lottieRef.current?.stop() } } }, [isSpeaking, lottieRef]) useEffect(() => { const onCallStart = () => setCallStatus(CallStatus.ACTIVE); const onCallEnd = () => { setCallStatus(CallStatus.FINISHED); } const onMessage = (message: Message) => { if(message.type === 'transcript' && message.transcriptType === 'final') { const newMessage= { role: message.role, content: message.transcript} setMessages((prev) => [newMessage, ...prev]) } } const onSpeechStart = () => setIsSpeaking(true); const onSpeechEnd = () => setIsSpeaking(false); const onError = (error: Error) => console.log('Error', error); vapi.on('call-start', onCallStart); vapi.on('call-end', onCallEnd); vapi.on('message', onMessage); vapi.on('error', onError); vapi.on('speech-start', onSpeechStart); vapi.on('speech-end', onSpeechEnd); return () => { vapi.off('call-start', onCallStart); vapi.off('call-end', onCallEnd); vapi.off('message', onMessage); vapi.off('error', onError); vapi.off('speech-start', onSpeechStart); vapi.off('speech-end', onSpeechEnd); } }, []); const toggleMicrophone = () => { const isMuted = vapi.isMuted(); vapi.setMuted(!isMuted); setIsMuted(!isMuted) } const handleCall = async () => { setCallStatus(CallStatus.CONNECTING) const assistantOverrides = { variableValues: { subject, topic, style }, clientMessages: ["transcript"], serverMessages: [], } // @ts-expect-error - The configureAssistant function's return type doesn't match the expected type, but it works at runtime vapi.start(configureAssistant(voice, style), assistantOverrides) } const handleDisconnect = () => { setCallStatus(CallStatus.FINISHED) vapi.stop() } return ( <section className="flex flex-col h-[70vh]"> <section className="flex gap-8 max-sm:flex-col"> <div className="companion-section"> <div className="companion-avatar" style={{ backgroundColor: getSubjectColor(subject)}}> <div className={ cn( 'absolute transition-opacity duration-1000', callStatus === CallStatus.FINISHED || callStatus === CallStatus.INACTIVE ? 'opacity-1001' : 'opacity-0', callStatus === CallStatus.CONNECTING && 'opacity-100 animate-pulse' ) }> <Image src={`/icons/${subject}.svg`} alt={subject} width={150} height={150} className="max-sm:w-fit" /> </div> <div className={cn('absolute transition-opacity duration-1000', callStatus === CallStatus.ACTIVE ? 'opacity-100': 'opacity-0')}> <Lottie lottieRef={lottieRef} animationData={soundwaves} autoplay={false} className="companion-lottie" /> </div> </div> <p className="font-bold text-2xl">{name}</p> </div> <div className="user-section"> <div className="user-avatar"> <Image src={userImage} alt={userName} width={130} height={130} className="rounded-lg" /> <p className="font-bold text-2xl"> {userName} </p> </div> <button className="btn-mic" onClick={toggleMicrophone} disabled={callStatus !== CallStatus.ACTIVE}> <Image src={isMuted ? '/icons/mic-off.svg' : '/icons/mic-on.svg'} alt="mic" width={36} height={36} /> <p className="max-sm:hidden"> {isMuted ? 'Turn on microphone' : 'Turn off microphone'} </p> </button> <button className={cn('rounded-lg py-2 cursor-pointer transition-colors w-full text-white', callStatus ===CallStatus.ACTIVE ? 'bg-red-700' : 'bg-primary', callStatus === CallStatus.CONNECTING && 'animate-pulse')} onClick={callStatus === CallStatus.ACTIVE ? handleDisconnect : handleCall}> {callStatus === CallStatus.ACTIVE ? "End Session" : callStatus === CallStatus.CONNECTING ? 'Connecting' : 'Start Session' } </button> </div> </section> <section className="transcript"> <div className="transcript-message no-scrollbar"> {messages.map((message, index) => { if(message.role === 'assistant') { return ( <p key={index} className="max-sm:text-sm"> { name .split(' ')[0] .replace('/[.,]/g, ','') }: {message.content} </p> ) } else { return <p key={index} className="text-primary max-sm:text-sm"> {userName}: {message.content} </p> } })} </div> <div className="transcript-fade" /> </section> </section> ) } export default Page 
Enter fullscreen mode Exit fullscreen mode

🧠 Why Vapi?

Without Vapi, you’d need to manage:

  • WebSockets
  • STT (Speech-to-Text)
  • TTS (Text-to-Speech)
  • Voice playback and stream sync

Vapi handles all of this with just a few lines of code. It’s like magic for voice-first AI apps.

How the Voice Assistant Works (Step-by-Step)

mermaid-diagram

  1. User clicks the Call Button
  2. Vapi opens a live voice stream
  3. User speaks a question
  4. Vapi transcribes and sends it to OpenAI (GPT-4)
  5. OpenAI returns a response
  6. Vapi turns the response into speech
  7. Browser plays the response to the user

Local Setup

Prerequisites:

  • Node.js 18+
  • Vapi API key

.env.local file:

env NEXT_PUBLIC_VAPI_WEB_TOKEN=your_vapi_web_token VAPI_SECRET_KEY=your_vapi_secret_ke 
Enter fullscreen mode Exit fullscreen mode

Run It:

npm install npm run dev 
Enter fullscreen mode Exit fullscreen mode

Open http://localhost:3000 and press the button to start talking to GPT-4.

Takeaways

Learnflow AI proves one thing:

Talking to an AI is way more fluid than typing to one.

The Vapi + GPT-4 combo lets you build powerful assistants with:

  • Real-time spoken conversations
  • Zero friction UI
  • High retention and comprehension

And you can build the whole MVP in a weekend.

What’s Coming in Part 2

Next, we’ll go deeper and make it personal:

  • Auth and login with Kinde
  • Protected routes + dashboards build
  • Convex for backend state and real-time updates
  • Usage limits with credit tracking

Try the MVP or Build Your Own

GitHub: github.com/sholajegede/learnflow_ai

If you want to setup Kinde Auth before the next part 2, check out this post.

Top comments (0)