Posted on Jul 27

Empathy AI-Your AI Help.

#devchallenge #assemblyaichallenge #ai #api

AssemblyAI Voice Agents Challenge: Real-Time

This is a submission for the AssemblyAI Voice Agents Challenge

What I Built

EmpathyAI is a real-time voice-powered mental health support application that provides compassionate AI-driven conversations for individuals experiencing emotional distress. The system processes spoken input through advanced speech recognition, analyzes emotional content using AI, and responds with empathetic voice-based support.

Demo

GitHub Repository

React frontend app
https://github.com/vpjigin/EmpathyAIReact.git
Spring-boot backend
https://github.com/vpjigin/EmpathyAISpringBoot.git

AssemblyAI Universal-Streaming Technology

This application demonstrates advanced real-time audio processing powered by AssemblyAI’s Universal-Streaming API. The system enables low-latency, turn-based, and secure transcription, enabling emotionally intelligent AI conversations.

Core Architecture

The architecture follows a multi-layered streaming pipeline:
Client Audio → WebSocket Handler → AssemblyAI Streaming → AI Processing → Response

AssemblyAI Streaming Implementation

Real-time WebSocket Connection The backend creates a persistent WebSocket connection to AssemblyAI’s streaming endpoint:

private static final String ASSEMBLYAI_STREAMING_URL = "wss://streaming.assemblyai.com/v3/ws"; public CompletableFuture<StreamingSession> createStreamingSession(String sessionId, TranscriptCallback callback) { String connectionUrl = ASSEMBLYAI_STREAMING_URL + "?sample_rate=16000&format_turns=true"; Map<String, String> headers = new HashMap<>(); headers.put("Authorization", apiKey); WebSocketClient client = new WebSocketClient(serverUri, headers) { @Override public void onMessage(String message) { JsonNode jsonMessage = objectMapper.readTree(message); if ("Turn".equals(messageType)) { String transcript = jsonMessage.get("transcript").asText(); boolean isFormatted = jsonMessage.get("turn_is_formatted").asBoolean(); if (isFormatted) { callback.onTranscript(transcript, true); } } } }; }

Audio Streaming Handler The AudioStreamingWebSocketHandler component bridges client-side audio to the AssemblyAI session:

@Component public class AudioStreamingWebSocketHandler implements WebSocketHandler { @Autowired private AssemblyAIStreamingServiceV2 assemblyAIStreamingService; private void handleBinaryMessage(WebSocketSession session, BinaryMessage message) { StreamingSessionV2 assemblySession = assemblyAISessions.get(session.getId()); if (assemblySession != null) { ByteBuffer audioData = message.getPayload(); byte[] audioBytes = new byte[audioData.remaining()]; audioData.get(audioBytes); assemblySession.sendAudioData(audioBytes); } } private void startStreaming(WebSocketSession session, String conversationUuid) { assemblyAIStreamingService.createStreamingSession(session.getId(), new TranscriptCallback() { @Override public void onTranscript(String text, boolean isFinal) { if (isFinal) { handleFinalTranscript(session, conversation, text); } } }); } }

Advanced Features Utilized
Turn-based Transcription: format_turns=true for human-like flow
16kHz Audio: sample_rate=16000 ensures clarity
TLS/SSL Security: Secured with valid certs
Concurrent Streaming: Multiple session support
Message Type Handling: Supports "Begin", "Turn", and "Termination" types
Dual Implementation Strategy
I implemented two parallel streaming strategies:

AssemblyAIStreamingService: Uses Java-WebSocket for low-level WebSocket handling
AssemblyAIStreamingServiceV2: Uses Spring’s StandardWebSocketClient for seamless Spring Boot integration

// Spring-based implementation public CompletableFuture<StreamingSessionV2> createStreamingSession(String sessionId, TranscriptCallback callback) { StandardWebSocketClient client = new StandardWebSocketClient(); WebSocketHttpHeaders headers = new WebSocketHttpHeaders(); headers.add("Authorization", apiKey); WebSocketHandler handler = new WebSocketHandler() { @Override public void handleMessage(WebSocketSession session, WebSocketMessage<?> message) { // Handle messages using Spring WebSocket framework } }; client.doHandshake(handler, headers, serverUri).get(); }

Technical Capabilities Leveraged

1.Real-time Binary Audio Streaming
2.Low-latency (<1s) Transcription
3.Turn-based Conversation Context
4.Error Recovery & Retry Mechanism
5.Scalable Concurrent Sessions

Project Structure (Brief)

├── controller/ ├── service/ ├── websocket/ ├── model/ ├── config/

DEV Community