DEV Community

Cover image for Empathy AI-Your AI Help.
Jigin Vp
Jigin Vp

Posted on

Empathy AI-Your AI Help.

AssemblyAI Voice Agents Challenge: Real-Time

This is a submission for the AssemblyAI Voice Agents Challenge

What I Built

EmpathyAI is a real-time voice-powered mental health support application that provides compassionate AI-driven conversations for individuals experiencing emotional distress. The system processes spoken input through advanced speech recognition, analyzes emotional content using AI, and responds with empathetic voice-based support.


Demo

Demo image of website


GitHub Repository

React frontend app
https://github.com/vpjigin/EmpathyAIReact.git
Spring-boot backend
https://github.com/vpjigin/EmpathyAISpringBoot.git


AssemblyAI Universal-Streaming Technology

This application demonstrates advanced real-time audio processing powered by AssemblyAI’s Universal-Streaming API. The system enables low-latency, turn-based, and secure transcription, enabling emotionally intelligent AI conversations.

Core Architecture

The architecture follows a multi-layered streaming pipeline:
Client Audio → WebSocket Handler → AssemblyAI Streaming → AI Processing → Response

AssemblyAI Streaming Implementation

  1. Real-time WebSocket Connection The backend creates a persistent WebSocket connection to AssemblyAI’s streaming endpoint:
private static final String ASSEMBLYAI_STREAMING_URL = "wss://streaming.assemblyai.com/v3/ws"; public CompletableFuture<StreamingSession> createStreamingSession(String sessionId, TranscriptCallback callback) { String connectionUrl = ASSEMBLYAI_STREAMING_URL + "?sample_rate=16000&format_turns=true"; Map<String, String> headers = new HashMap<>(); headers.put("Authorization", apiKey); WebSocketClient client = new WebSocketClient(serverUri, headers) { @Override public void onMessage(String message) { JsonNode jsonMessage = objectMapper.readTree(message); if ("Turn".equals(messageType)) { String transcript = jsonMessage.get("transcript").asText(); boolean isFormatted = jsonMessage.get("turn_is_formatted").asBoolean(); if (isFormatted) { callback.onTranscript(transcript, true); } } } }; } 
Enter fullscreen mode Exit fullscreen mode
  1. Audio Streaming Handler The AudioStreamingWebSocketHandler component bridges client-side audio to the AssemblyAI session:
@Component public class AudioStreamingWebSocketHandler implements WebSocketHandler { @Autowired private AssemblyAIStreamingServiceV2 assemblyAIStreamingService; private void handleBinaryMessage(WebSocketSession session, BinaryMessage message) { StreamingSessionV2 assemblySession = assemblyAISessions.get(session.getId()); if (assemblySession != null) { ByteBuffer audioData = message.getPayload(); byte[] audioBytes = new byte[audioData.remaining()]; audioData.get(audioBytes); assemblySession.sendAudioData(audioBytes); } } private void startStreaming(WebSocketSession session, String conversationUuid) { assemblyAIStreamingService.createStreamingSession(session.getId(), new TranscriptCallback() { @Override public void onTranscript(String text, boolean isFinal) { if (isFinal) { handleFinalTranscript(session, conversation, text); } } }); } } 
Enter fullscreen mode Exit fullscreen mode
  1. Advanced Features Utilized
  2. Turn-based Transcription: format_turns=true for human-like flow
  3. 16kHz Audio: sample_rate=16000 ensures clarity
  4. TLS/SSL Security: Secured with valid certs
  5. Concurrent Streaming: Multiple session support
  6. Message Type Handling: Supports "Begin", "Turn", and "Termination" types

  7. Dual Implementation Strategy
    I implemented two parallel streaming strategies:

  • AssemblyAIStreamingService: Uses Java-WebSocket for low-level WebSocket handling

  • AssemblyAIStreamingServiceV2: Uses Spring’s StandardWebSocketClient for seamless Spring Boot integration

// Spring-based implementation public CompletableFuture<StreamingSessionV2> createStreamingSession(String sessionId, TranscriptCallback callback) { StandardWebSocketClient client = new StandardWebSocketClient(); WebSocketHttpHeaders headers = new WebSocketHttpHeaders(); headers.add("Authorization", apiKey); WebSocketHandler handler = new WebSocketHandler() { @Override public void handleMessage(WebSocketSession session, WebSocketMessage<?> message) { // Handle messages using Spring WebSocket framework } }; client.doHandshake(handler, headers, serverUri).get(); } 
Enter fullscreen mode Exit fullscreen mode

Technical Capabilities Leveraged

1.Real-time Binary Audio Streaming
2.Low-latency (<1s) Transcription
3.Turn-based Conversation Context
4.Error Recovery & Retry Mechanism
5.Scalable Concurrent Sessions


Project Structure (Brief)

├── controller/
├── service/
├── websocket/
├── model/
├── config/

Top comments (0)