In virtual learning and streaming platforms, real-time transcription improves accessibility for students and viewers with hearing impairments, enhances comprehension, and provides accurate lesson records.
In this tutorial, you’ll learn how to build a virtual classroom app with real-time transcription in React Native, using the Stream Video and Audio SDK for livestreaming and AssemblyAI for transcription.
Demo video:
Prerequisites
To follow along with this tutorial, you should have:
- Node.js and npm installed
- A Stream account (create a free account or log in to your dashboard)
- An AssemblyAI account (create a free account or log in to your dashboard) ## Setting up the Project
To get started, create a new React Native project. Open your terminal, navigate to your desired directory, and run the command below.
npx @react-native-community/cli@latest init virtual_class_app Installing Stream SDK
To enable the application to interact with Stream services, install the Stream SDK using the command below.
npm install @stream-io/video-react-native-sdk @stream-io/react-native-webrtc npm install react-native-svg @react-native-community/netinfo @notifee/react-native # Install pods for iOS npx pod-install Next, you need to configure the application’s access to the camera and microphone. To do this, add the following permissions to the AndroidManifest.xml file:
<manifest xmlns:android="http://schemas.android.com/apk/res/android"> <uses-feature android:name="android.hardware.camera" /> <uses-feature android:name="android.hardware.camera.autofocus" /> <uses-feature android:name="android.hardware.audio.output" /> <uses-feature android:name="android.hardware.microphone" /> <uses-permission android:name="android.permission.CAMERA" /> <uses-permission android:name="android.permission.RECORD_AUDIO" /> <uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" /> <uses-permission android:name="android.permission.CHANGE_NETWORK_STATE" /> <uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" /> <uses-permission android:name="android.permission.INTERNET" /> ... <application ... /> </application> </manifest> Also, add the following keys and values to the Info.plist file under the <dict> tag.
<plist version="1.0"> <dict> ... <key>CFBundleName</key> <string>$(PRODUCT_NAME)</string> <key>NSCameraUsageDescription</key> <string>$(PRODUCT_NAME) needs camera access for broadcasting</string> <key>NSMicrophoneUsageDescription</key> <string>$(PRODUCT_NAME) requires microphone access in order to capture and transmit audio</string> ... </dict> </plist> How to Retrieve Stream API Keys
To access your Stream API key and secret, log in to your Stream dashboard and create a new app as shown below.
After creating the app, on the project dashboard, click API keys in the top menu, then copy the Stream API key and Secret key, as shown below.
How to Retrieve AssemblyAI API Key
You will need an AssemblyAI API key to transcribe a livestream. Log in to the AssemblyAI API key page to retrieve your API key, as shown in the screenshot below.
Creating the Application Backend
Next, you’ll create an application server to handle Stream token generation and WebSocket connections for livestream transcription. To do this, create a new folder named backend outside the React Native folder, then run the following command to set it up and install the required packages.
npm init -y npm install express cors ws node-fetch @stream-io/node-sdk Now, create an API endpoint to handle Stream token generation and livestream transcription using WebSockets. Inside the backend folder, create a new file named app.js and add the following code.
const express = require('express'); const cors = require('cors'); const WebSocket = require('ws'); const fetch = (...args) => import('node-fetch').then(({ default: fetch }) => fetch(...args)); const { StreamClient } = require('@stream-io/node-sdk'); const app = express(); app.use(express.json()); app.use(cors()); const apiKey = '<strem_key>'; const apiSecret = '<stream_Secret >'; const ASSEMBLYAI_API_KEY = '<assemblyai_apikey>'; const serverClient = new StreamClient(apiKey, apiSecret); app.get('/token', async (req, res) => { const userId = String(req.query.user_id || '').trim(); const callId = String(req.query.call_id || '').trim(); if (!userId || !callId) { return res.status(400).json({ error: 'Missing user_id or call_id' }); } try { const role = userId.startsWith('host') ? 'admin' : 'user'; await serverClient.upsertUsers([{ id: userId, role }]); const token = serverClient.generateUserToken({ user_id: userId }); res.json({ token, callId }); } catch (err) { res.status(500).json({ error: 'Could not generate token' }); } }); const server = app.listen(3000, () => { console.log('Server running on port 3000'); }); const audioWSS = new WebSocket.Server({ noServer: true }); const viewerWSS = new WebSocket.Server({ noServer: true }); let aaiSocket = null; server.on('upgrade', (req, socket, head) => { if (req.url === '/audio') { audioWSS.handleUpgrade(req, socket, head, (ws) => { audioWSS.emit('connection', ws, req); }); } else if (req.url === '/captions') { viewerWSS.handleUpgrade(req, socket, head, (ws) => { viewerWSS.emit('connection', ws, req); }); } else { socket.destroy(); } }); async function connectAssemblyAI() { try { const tokenResp = await fetch('https://api.assemblyai.com/v2/realtime/token', { method: 'POST', headers: { authorization: ASSEMBLYAI_API_KEY }, }); const { token } = await tokenResp.json(); aaiSocket = new WebSocket(`wss://api.assemblyai.com/v2/realtime/ws?sample_rate=16000&token=${token}`); aaiSocket.on('open'); aaiSocket.on('message', (msg) => { try { const data = JSON.parse(msg); if (data.text) { console.log('Transcript:', data.text); viewerWSS.clients.forEach((client) => { if (client.readyState === WebSocket.OPEN) { client.send(JSON.stringify({ text: data.text })); } }); } } catch (err) { console.error('Error parsing AssemblyAI message:', err); } }); aaiSocket.on('close', () => { setTimeout(connectAssemblyAI, 5000); }); } catch (err) { setTimeout(connectAssemblyAI, 5000); } } connectAssemblyAI(); audioWSS.on('connection', (ws) => { ws.on('message', (message) => { if (aaiSocket && aaiSocket.readyState === WebSocket.OPEN) { aaiSocket.send(JSON.stringify({ audio_data: message.toString() })); } }); ws.on('close'); }); viewerWSS.on('connection', (ws) => { ws.send(JSON.stringify({ text: 'Connected to captions server' })); }); In the code above, the /token endpoint authenticates users by generating Stream tokens using @stream-io/node-sdk. The /audio and /captions endpoints handle real-time transcription using WebSockets by connecting to AssemblyAI. Replace the <stream_key>, <stream_secret>, and <assemblyai_apikey> placeholders with their respective values in the code.
Creating the App Home Screen
Next, you’ll create the home screen that lets users create or join a call using a call ID. To do this, open the App.tsx file and replace its code with the following:
import React, { useState, useEffect } from 'react'; import { View, Text, Button, StyleSheet, TextInput } from 'react-native'; import HostScreen from './src/HostScreen'; import ViewerScreen from './src/ViewerScreen'; const API_KEY = '<strem_key>'; export default function App() { const [role, setRole] = useState(""); const [token, setToken] = useState<string | null>(null); const [callId, setCallId] = useState(""); const [inputCallId, setInputCallId] = useState(""); useEffect(() => { if (!role || !callId) return; fetch(`http://localhost:3000/token?user_id=${role}-user&call_id=${callId}`) .then(res => res.json()) .then(data => setToken(data.token)) .catch(err => console.error("Token fetch error:", err)); }, [role, callId]); if (!role) { return ( <View style={styles.container}> <Text style={styles.title}>Enter Call ID:</Text> <TextInput style={styles.input} placeholder="Enter call ID" value={inputCallId} onChangeText={setInputCallId} /> <Button title="Join as Host" onPress={() => { if (inputCallId.trim()) { setRole('host'); setCallId(inputCallId.trim()); } }} /> <View style={{ height: 10 }} /> <Button title="Join as Viewer" onPress={() => { if (inputCallId.trim()) { setRole('viewer'); setCallId(inputCallId.trim()); } }} /> </View> ); } if (!token) { return ( <View style={styles.container}> <Text>Loading token...</Text> </View> ); } if (role === 'host') { return ( <HostScreen apiKey={API_KEY} user={{ id: 'host-user' }} token={token} callId={callId} /> ); } return ( <ViewerScreen apiKey={API_KEY} user={{ id: 'viewer-user' }} token={token} callId={callId} /> ); } const styles = StyleSheet.create({ container: { flex: 1, justifyContent: 'center', alignItems: 'center', padding: 20 }, title: { fontSize: 20, marginBottom: 20 }, input: { borderWidth: 1, borderColor: '#ccc', padding: 10, width: '80%', marginBottom: 20, borderRadius: 5 } }); Creating the Caption Component
Next, you need to create a Captions component that connects to the backend via a WebSocket for real-time transcription. To do this, create a new file named Captions.js in the src folder and add the following code.
import React, { useEffect, useState, useRef } from 'react'; import { View, Text, StyleSheet } from 'react-native'; export default function Captions() { const [transcript, setTranscript] = useState(''); const wsRef = useRef(null); useEffect(() => { const ws = new WebSocket('ws://localhost:3000/captions'); wsRef.current = ws; ws.onopen = () => console.log('Connected to captions server'); ws.onmessage = (event) => { try { const data = JSON.parse(event.data); if (data.text) { setTranscript(data.text); } } catch (err) { console.error('Error parsing caption data:', err); } }; ws.onerror = (err) => console.error('Caption WS error:', err); ws.onclose = () => console.log('Caption connection closed'); return () => { if (ws.readyState === WebSocket.OPEN) { ws.close(); } }; }, []); if (!transcript) return null; return ( <View style={styles.container}> <Text style={styles.text}>how are you doin1g</Text> </View> ); } const styles = StyleSheet.create({ container: { position: 'absolute', bottom: '20%', width: '100%', alignItems: 'center', justifyContent: 'center', paddingHorizontal: 20, }, text: { color: '#fff', fontSize: 16, textAlign: 'center', textShadowColor: '#000', textShadowRadius: 4, }, }); How to Create the Host Screen
Let’s create the host screen to allow the host to start a live class using a call ID, with video, audio, and real-time captions enabled. To do this, create a src/HostScreen.js file and add the following code:
import React, { useEffect } from 'react'; import { StreamVideo, StreamVideoClient, StreamCall, HostLivestream } from '@stream-io/video-react-native-sdk'; export default function HostScreen({ apiKey, user, token, callId }) { const client = new StreamVideoClient({ apiKey, user, token }); const call = client.call('livestream', callId); useEffect(() => { call.join({ create: true }).catch(console.error); }, [call]); return ( <StreamVideo client={client}> <StreamCall call={call}> <HostLivestream /> </StreamCall> </StreamVideo> ); } How to Create the Viewer Screen
Next, create a viewer screen that lets students join a live class using a call ID, with real-time captions enabled. To do this, create a src/ViewerScreen.js file and add the following code.
import React, { useEffect, useState } from 'react'; import { View, Text, StyleSheet } from 'react-native'; import { StreamVideo, StreamVideoClient, StreamCall, LivestreamPlayer } from '@stream-io/video-react-native-sdk'; import Captions from './Captions'; export default function ViewerScreen({ apiKey, user, token, callId }) { const [client, setClient] = useState(null); const [call, setCall] = useState(null); useEffect(() => { const videoClient = new StreamVideoClient({ apiKey, user, token }); const videoCall = videoClient.call('livestream', callId); videoCall.join().then(() => { console.log('Joined call as viewer'); }).catch(err => { console.error('Error joining call:', err); }); setClient(videoClient); setCall(videoCall); return () => { videoCall.leave().catch(console.error); }; }, [apiKey, user, token, callId]); if (!client || !call) { return ( <View style={styles.container}> <Text>Connecting to stream...</Text> </View> ); } return ( <StreamVideo client={client}> <StreamCall call={call}> <LivestreamPlayer callType="livestream" callId={callId} /> <Captions call={call} /> </StreamCall> </StreamVideo> ); } const styles = StyleSheet.create({ container: { flex: 1, justifyContent: 'center', alignItems: 'center' } }); In your terminal, run the command below to start the React Native app:
npm run ios Or
npm run android Testing the Application
Open the app, enter the call ID, and click “Join as host.” After starting the virtual class, open the app on another device, enter the same call ID, and join as a viewer. Once you join the call as a viewer, the host’s audio will be transcribed in real time, as shown in the image below.
Conclusion
In this tutorial, you learned how to build a virtual classroom app with real-time transcription using React Native, Stream’s Video and Audio SDK, and AssemblyAI. This setup allows teachers to host livestream classes with live captions, enhancing accessibility and comprehension for all students.
Want to keep building? Consider adding:
- Group chat with moderation
- An activity feed for class updates
- An AI assistant to respond to class questions
Happy coding! 🛠️





Top comments (1)
nice work