Divya

Posted on Jul 28

🧠🎤 FluentMate - Your Smart Fluency Friend & 24 7 Mentor 💬🤖

#devchallenge #assemblyaichallenge #ai #api

AssemblyAI Voice Agents Challenge: Domain Expert

This is a submission for the AssemblyAI Voice Agents Challenge

What I Built

Speak Reflect Improve is your always available, intelligent English fluency mentor. It’s a real-time voice agent that analyzes your spoken English with laser precision, offering detailed feedback on pronunciation, grammar, fluency, vocabulary, and more.

Whether you're preparing for job interviews, public speaking, or simply building confidence, Speak Reflect Improve helps you self-reflect and speak better every day.

It fits under both of these prompt categories in the challenge:-

Business Automation - because it offers scalable, 24×7 personalized fluency coaching, helpful for professional development, onboarding, or upskilling.
🎓 Domain Expert - since it understands the nuances of English communication and acts as a master tutor trained in fluency, idioms, and CEFR grading standards.

🎯 Why I Built It

I’m an introvert. In real life, I don’t talk to people much. But life doesn’t pause for shyness. Interviews, group discussions, pitches - they all demand communication excellence.

And I had no one to practice with. No teacher on demand. No fluent partner to correct me at 2AM.

So I built Speak Reflect Improve - my intelligent, judgment-free, 24×7 speaking companion that listens, scores, corrects, and uplifts. It has come to become my silent teacher, and I hope it becomes yours too.

💡 Features

🎙️ Real-Time Speaking Assessments
- Record your voice on any topic or select random prompts.
- Choose between short (2 min) upto long (7 min) assessments.
📈 AI-Powered Fluency Feedback
- Pronunciation, vocabulary, grammar, fluency, pauses, fillers, coherence, idioms usage - it covers it all.
- Scores out of 10 with CEFR level mapping (A1 to C2).
🎯 Custom Coaching from a Fluency Expert Persona
- Personalized suggestions to reach the next level.
- Actionable tips, goals, and motivational feedback.
🧠 Built for Learning
- Designed to improve real communication for students, developers, job-seekers, and global speakers.

🌐 Demo

🔗 Live Site:

Check out my live site here 👇 (please hold on a minute, while it loads 🥹🥹, or watch the demo in the meanwhile 😅)

Speak Reflect Improve

🎥 Video Demo:

Check out this video where I'm showcasing or audiocasing🤔🤔 my project:-

💻 GitHub Repository

Check out my Github repository below. Perhaps you wanna check out the code, or just dive into it, clone it, fork it, or contribute 😁!

Divya4879 / Language-Fluency-Coach

🎙️ Speak Reflect Improve

Speak naturally and let our AI analyze your English proficiency across all areas

Check out my project's snapshots here:-

screencapture-127-0-0-1-5000-2025-07-28-05_40_34

Check it out here(live version) :- Speak Reflect Improve

Features

🎤 Speech Analysis

Pronunciation Assessment: Detailed analysis of pronunciation clarity, accent, and intonation
Vocabulary Evaluation: Assessment of vocabulary range, sophistication, and appropriateness
Grammar Analysis: Grammar accuracy, sentence structure, and complexity evaluation
Fluency Assessment: Speech flow, hesitation patterns, and filler word detection
Coherence & Organization: Logical flow and idea connection analysis
Idioms & Phrases: Natural expression and idiomatic usage evaluation

📊 Proficiency Grading

CEFR Level Assessment: A1, A2, B1, B2, C1, C2 level determination
Overall Grade: 1-10 scoring system with detailed explanations
Actionable Feedback: Specific improvement recommendations

🤖 AI-Powered Coaching

Real-time Feedback: Immediate corrections and suggestions
Personalized Tips: Customized improvement strategies
Cultural Context: Natural expressions and…

View on GitHub

⚙️ Tech Stack

Technology	Purpose
Flask	Python backend for routing and logic
AssemblyAI	Real-time streaming speech-to-text
Groq API	Fast LLM-based English analysis
JavaScript	UI interactions and audio logic
HTML/CSS	Responsive frontend with dark theme
Web APIs	Microphone access via MediaRecorder
HTTPS	Required for audio recording in browser

🧠 Technical Implementation & AssemblyAI Integration

The heart of the app lies in real-time transcription using AssemblyAI’s Universal-Streaming API, and analysis using Groq’s Llama 3 model.

Here are the code snippets demonstrating AssemblyAI integration in the English Fluency Coach project:

🎯 1. AssemblyAI Initialization & Configuration

# utils/voice_manager.py - AssemblyAI Setup import assemblyai as aai class VoiceManager: def __init__(self, api_keys: Dict[str, str]): self.api_keys = api_keys self.assemblyai_available = False self._init_assemblyai() def _init_assemblyai(self): if ASSEMBLYAI_AVAILABLE and self.api_keys.get('ASSEMBLYAI_API_KEY'): try: aai.settings.api_key = self.api_keys['ASSEMBLYAI_API_KEY'] test_config = aai.TranscriptionConfig( language_detection=True, punctuate=True, format_text=True, speaker_labels=False, auto_highlights=False ) self.assemblyai_available = True print("✅ AssemblyAI initialized successfully") except Exception as e: print(f"❌ AssemblyAI initialization failed: {e}") self.assemblyai_available = False

This code initializes AssemblyAI SDK with optimized configuration for English fluency assessment. It sets
up language detection for universal support, enables punctuation and text formatting for professional analysis, and configures
single-speaker optimization. The initialization includes comprehensive error handling and status reporting, ensuring reliable setup for speech-to-text processing in the fluency coaching platform.

🎤 2. Core Audio Transcription with Dual-Mode Approach

# utils/voice_manager.py - Main Transcription Function def transcribe_audio(self, audio_file_path: str) -> str: if not os.path.exists(audio_file_path): return "❌ Audio file not found" if self.assemblyai_available: try: print("🔄 Trying AssemblyAI SDK...") config = aai.TranscriptionConfig( language_detection=True, punctuate=True, format_text=True, speaker_labels=False, auto_highlights=False ) transcriber = aai.Transcriber(config=config) transcript = transcriber.transcribe(audio_file_path) if transcript.status == "completed": print("✅ AssemblyAI SDK transcription successful") return self._clean_transcription(transcript.text) elif transcript.status == "error": print(f"❌ AssemblyAI SDK error: {transcript.error}") return f"❌ Transcription error: {transcript.error}" except Exception as e: print(f"❌ AssemblyAI SDK error: {e}") if self.api_keys.get('ASSEMBLYAI_API_KEY'): try: print("🔄 Trying AssemblyAI Direct API...") result = self._transcribe_with_api(audio_file_path) if result and not result.startswith("❌"): print("✅ AssemblyAI API transcription successful") return self._clean_transcription(result) except Exception as e: print(f"❌ AssemblyAI API error: {e}") return "❌ Transcription failed. Please check API configuration."

This function implements a dual-mode transcription approach using both AssemblyAI SDK and direct API calls for maximum reliability. The primary method uses the SDK with enhanced configuration optimized for English fluency assessment, including language detection and professional formatting. If the SDK fails, it automatically falls back to direct API calls, ensuring consistent transcription availability for the fluency coaching platform.

🔧 3. Direct API Implementation with Enhanced Features

# utils/voice_manager.py - Direct API Implementation def _transcribe_with_api(self, audio_file_path: str) -> str: try: headers = {'authorization': self.api_keys['ASSEMBLYAI_API_KEY']} print("📤 Uploading audio file...") with open(audio_file_path, 'rb') as f: response = requests.post( 'https://api.assemblyai.com/v2/upload', headers=headers, files={'file': f}, timeout=60 ) if response.status_code != 200: return f"❌ Upload failed: {response.status_code} - {response.text}" upload_url = response.json()['upload_url'] print(f"✅ File uploaded: {upload_url}") print("🔄 Requesting transcription...") data = { 'audio_url': upload_url, 'language_detection': True, 'punctuate': True, 'format_text': True, 'speaker_labels': False, 'auto_highlights': False } response = requests.post( 'https://api.assemblyai.com/v2/transcript', headers=headers, json=data, timeout=30 ) if response.status_code != 200: return f"❌ Transcription request failed: {response.status_code}" transcript_id = response.json()['id'] print(f"🔄 Transcription ID: {transcript_id}") print("⏳ Waiting for transcription to complete...") max_attempts = 60 # 2-minute timeout attempt = 0 while attempt < max_attempts: response = requests.get( f'https://api.assemblyai.com/v2/transcript/{transcript_id}', headers=headers, timeout=30 ) if response.status_code != 200: return f"❌ Status check failed: {response.status_code}" result = response.json() status = result['status'] if status == 'completed': print("✅ Transcription completed") return result['text'] or "❌ No text in transcription result" elif status == 'error': error_msg = result.get('error', 'Unknown error') return f"❌ Transcription error: {error_msg}" elif status in ['queued', 'processing']: print(f"⏳ Status: {status} (attempt {attempt + 1}/{max_attempts})") import time time.sleep(2) # 2-second polling interval attempt += 1 else: return f"❌ Unknown status: {status}" return "❌ Transcription timeout - took too long to process" except requests.exceptions.Timeout: return "❌ Request timeout - please try again" except Exception as e: return f"❌ Unexpected error: {str(e)}"

This function implements direct AssemblyAI API calls as a fallback mechanism. It handles the complete
transcription workflow: file upload, transcription request with enhanced features, and intelligent polling with timeout. The
implementation includes comprehensive error handling for network issues, API failures, and timeouts. It uses a 2-second polling
interval with a 2-minute maximum timeout, ensuring reliable transcription processing for English fluency assessment.

🧹 4. Advanced Text Processing & Validation

def _clean_transcription(self, text: str) -> str: if not text: return "❌ Empty transcription result" text = text.strip() text = re.sub(r'\s+', ' ', text) text = re.sub(r'([.!?])\s*([a-z])', lambda m: m.group(1) + ' ' + m.group(2).upper(), text) if text and not text[0].isupper(): text = text[0].upper() + text[1:] if text and text[-1] not in '.!?': text += '.' return text def validate_audio_file(self, file_path: str) -> Dict[str, any]: if not os.path.exists(file_path): return { 'valid': False, 'error': 'File does not exist', 'file_size': 0 } file_size = os.path.getsize(file_path) max_size = 100 * 1024 * 1024 # 100MB AssemblyAI limit if file_size > max_size: return { 'valid': False, 'error': f'File too large: {file_size / (1024*1024):.1f}MB (max 100MB)', 'file_size': file_size } if file_size < 1000: # Minimum viable audio size return { 'valid': False, 'error': 'File too small - may be empty or corrupted', 'file_size': file_size } return { 'valid': True, 'error': None, 'file_size': file_size, 'file_size_mb': file_size / (1024 * 1024) }

This code handles post-transcription text processing and audio file validation. The cleaning function
normalizes whitespace, fixes capitalization, and ensures proper sentence structure for accurate fluency analysis. The validation
function checks file existence, size limits (100MB AssemblyAI maximum), and minimum viable size to prevent processing empty or
corrupted files. This ensures high-quality input for the English fluency assessment system.

🌐 5. Flask Integration & Speech Analysis Endpoint

@app.route('/analyze_speech', methods=['POST']) def analyze_speech(): try: audio_file = request.files.get('audio') assessment_type = request.form.get('type', 'general') if not audio_file: return jsonify({'error': 'No audio file provided'}), 400 session_id = session.get('session_id', datetime.now().strftime("%Y%m%d_%H%M%S")) session['session_id'] = session_id temp_path = f"temp/audio_{session_id}_{assessment_type}.wav" os.makedirs('temp', exist_ok=True) audio_file.save(temp_path) print(f"🔄 Starting speech analysis of {temp_path}") transcription = voice_manager.transcribe_audio(temp_path) if transcription.startswith("❌"): return jsonify({'error': transcription}), 500 analysis = english_analyzer.analyze_speech_proficiency( transcription=transcription, audio_file_path=temp_path, assessment_type=assessment_type ) if os.path.exists(temp_path): os.remove(temp_path) session['last_analysis'] = analysis session['last_transcription'] = transcription print(f"✅ Speech analysis completed: Grade {analysis.get('overall_grade', 'N/A')}") return jsonify({ 'success': True, 'transcription': transcription, 'analysis': analysis }) except Exception as e: print(f"❌ Speech analysis error: {e}") return jsonify({'error': f'Speech analysis failed: {str(e)}'}), 500

This Flask endpoint orchestrates the complete speech analysis workflow. It handles audio file uploads,
creates session-based temporary files, processes transcription through AssemblyAI, performs comprehensive English proficiency
analysis, and manages cleanup. The endpoint includes robust error handling, session management for user tracking, and returns
detailed analysis results including transcription and proficiency assessment for the English fluency coaching platform.

🎓 6. AI-Powered English Proficiency Analysis

def analyze_speech_proficiency(self, transcription: str, audio_file_path: str = None,assessment_type: str = 'general') -> Dict: try: audio_analysis = self._analyze_audio_characteristics(audio_file_path) if audio_file_path else {} text_analysis = self._analyze_text_proficiency(transcription, assessment_type) combined_analysis = self._combine_analyses(text_analysis, audio_analysis, transcription) return combined_analysis except Exception as e: print(f"❌ Speech proficiency analysis error: {e}") raise def _create_proficiency_prompt(self, transcription: str, assessment_type: str) -> str: return f""" You are a world-renowned English language proficiency expert and certified TESOL instructor with 20+ years of experience. You are known for providing detailed, professional analysis like a premium English tutor. ASSESSMENT TYPE: {assessment_type} TRANSCRIPTION TO ANALYZE: "{transcription}" Provide a comprehensive professional English assessment following this EXACT format: ## PRONUNCIATION ANALYSIS [Detailed analysis of pronunciation quality, clarity, accent, stress patterns, intonation, and specific sounds. Be specific about what sounds good and what needs improvement.] ## VOCABULARY ASSESSMENT [Analyze vocabulary range, sophistication, word choice appropriateness, and lexical diversity.] - Advanced words used: [list specific advanced words they used] - Vocabulary level: [beginner/intermediate/advanced with explanation] - Suggested word upgrades: [specific examples like "good → excellent"] ## GRAMMAR EVALUATION [Assess grammar accuracy, sentence structure complexity, tense usage, and error patterns.] - Grammar strengths: [specific examples of correct usage] - Areas for improvement: [specific grammar points to work on] - Sentence complexity: [analysis of their sentence structures] ## FLUENCY ANALYSIS [Evaluate speech flow, hesitations, filler words, pace, natural rhythm, and speaking rate] - Filler words detected: [count and list them] - Speaking rate assessment: [words per minute if calculable] - Flow quality: [detailed assessment] ## COHERENCE & ORGANIZATION [Assess logical flow, idea connection, topic development, clarity of expression] ## CEFR LEVEL ASSESSMENT CEFR Level: [A1, A2, B1, B2, C1, or C2] Level Description: [Beginner/Elementary/Intermediate/Upper-Intermediate/Advanced/Proficient] ## DETAILED PROFESSIONAL FEEDBACK [Provide encouraging, detailed feedback like a professional English tutor. Be specific about their current abilities and growth potential.] ## PRIORITY FOCUS AREAS [List the top 2-3 areas they should focus on immediately for maximum improvement] Be detailed, professional, encouraging, and specific. Provide the kind of analysis a student would get from a premium English tutor. """

This code implements comprehensive English proficiency analysis using AI. It combines transcription analysis with audio characteristics to provide detailed assessment across pronunciation, vocabulary, grammar, fluency, and coherence. The system generates professional-grade feedback similar to premium English tutoring, including CEFR level assessment, specific improvement recommendations, and priority focus areas. The analysis prompt is designed to extract detailed, actionable insights for English language learners.

🎯 7. Personalized Fluency Coaching System

def provide_coaching(self, user_input: str, practice_type: str, topic: str, user_level: str) -> Dict: try: coaching_prompt = self._create_coaching_prompt(user_input, practice_type, topic, user_level) coaching_response = self._get_ai_response(coaching_prompt) parsed_coaching = self._parse_coaching_response(coaching_response) return parsed_coaching except Exception as e: print(f"❌ Coaching error: {e}") return self._generate_fallback_coaching(user_input, practice_type) def _create_coaching_prompt(self, user_input: str, practice_type: str, topic: str, user_level: str) -> str: return f""" You are an expert English fluency coach with 15+ years of experience helping students improve their speaking skills. You specialize in providing constructive, encouraging, and actionable feedback. PRACTICE TYPE: {practice_type} TOPIC: {topic} USER LEVEL: {user_level} USER INPUT: "{user_input}" Provide comprehensive coaching feedback following this EXACT format: ## IMMEDIATE FEEDBACK [Provide immediate positive reinforcement and acknowledgment of their effort] ## CORRECTIONS [List specific corrections needed with explanations] - Original: [what they said] - Corrected: [how it should be said] - Explanation: [why this correction is needed] ## PRONUNCIATION NOTES [Specific pronunciation feedback and tips] ## VOCABULARY ENHANCEMENT [Suggest better word choices or more advanced vocabulary] - Instead of: [basic word/phrase] - Try using: [advanced alternative] - Example: [sentence using the advanced word] ## GRAMMAR IMPROVEMENTS [Point out grammar issues and provide corrections] ## FLUENCY TIPS [Specific tips to improve natural flow and reduce hesitations] ## CULTURAL CONTEXT [Explain any cultural nuances or more natural expressions] ## PRACTICE SUGGESTION [Specific practice exercise based on their performance] ## ENCOURAGEMENT [Motivational message highlighting their progress and strengths] Provide specific, actionable feedback that helps them improve their English fluency naturally and confidently. """ def generate_practice_prompt(self, practice_type: str, topic: str, level: str) -> Dict: """ Generate practice prompts for different types of exercises """ prompts = { 'conversation': self._get_conversation_prompts(topic, level), 'pronunciation': self._get_pronunciation_prompts(level), 'vocabulary': self._get_vocabulary_prompts(topic, level), 'storytelling': self._get_storytelling_prompts(topic, level), 'debate': self._get_debate_prompts(topic, level), 'presentation': self._get_presentation_prompts(topic, level), 'song_analysis': self._get_song_prompts(level) } return prompts.get(practice_type, prompts['conversation'])

This code implements personalized AI-powered fluency coaching that provides specific, actionable feedback based on user input, practice type, and proficiency level. The system offers corrections, pronunciation notes, vocabulary enhancements, grammar improvements, and cultural context. It generates practice prompts for seven different exercise types (conversation, pronunciation, vocabulary, storytelling, debate, presentation, song analysis) and adapts coaching style to user level,providing encouraging yet constructive feedback to improve English fluency naturally.

🧪 Usage Guide

Go to homepage → Start Assessment
Choose one of these assessments:
- Quick (2 min)
- Deep Dive (5–7 min)
- Topic-Based
Speak naturally. AI listens, transcribes, and analyzes.
Instantly see:

Fluency score (1–10)
CEFR Level (A1–C2)
Feedback
Personalized tips

📊 Interpreting Results

Score
- 1–3: Basic, needs improvement
- 4–6: Intermediate
- 7–8: Advanced
- 9–10: Pro-level
CEFR
- A1–A2: Elementary
- B1–B2: Competent user
- C1–C2: Expert mastery