This is a submission for the AssemblyAI Voice Agents Challenge
What I Built
Speak Reflect Improve is your always available, intelligent English fluency mentor. Itβs a real-time voice agent that analyzes your spoken English with laser precision, offering detailed feedback on pronunciation, grammar, fluency, vocabulary, and more.
Whether you're preparing for job interviews, public speaking, or simply building confidence, Speak Reflect Improve helps you self-reflect and speak better every day.
It fits under both of these prompt categories in the challenge:-
Business Automation - because it offers scalable, 24Γ7 personalized fluency coaching, helpful for professional development, onboarding, or upskilling.
π Domain Expert - since it understands the nuances of English communication and acts as a master tutor trained in fluency, idioms, and CEFR grading standards.
π― Why I Built It
Iβm an introvert. In real life, I donβt talk to people much. But life doesnβt pause for shyness. Interviews, group discussions, pitches - they all demand communication excellence.
And I had no one to practice with. No teacher on demand. No fluent partner to correct me at 2AM.
So I built Speak Reflect Improve - my intelligent, judgment-free, 24Γ7 speaking companion that listens, scores, corrects, and uplifts. It has come to become my silent teacher, and I hope it becomes yours too.
π‘ Features
-
ποΈ Real-Time Speaking Assessments
- Record your voice on any topic or select random prompts.
- Choose between short (2 min) upto long (7 min) assessments.
-
π AI-Powered Fluency Feedback
- Pronunciation, vocabulary, grammar, fluency, pauses, fillers, coherence, idioms usage - it covers it all.
- Scores out of 10 with CEFR level mapping (A1 to C2).
-
π― Custom Coaching from a Fluency Expert Persona
- Personalized suggestions to reach the next level.
- Actionable tips, goals, and motivational feedback.
-
π§ Built for Learning
- Designed to improve real communication for students, developers, job-seekers, and global speakers.
π Demo
- π Live Site:
Check out my live site here π (please hold on a minute, while it loads π₯Ήπ₯Ή, or watch the demo in the meanwhile π )
- π₯ Video Demo:
Check out this video where I'm showcasing or audiocasingπ€π€ my project:-
π» GitHub Repository
Check out my Github repository below. Perhaps you wanna check out the code, or just dive into it, clone it, fork it, or contribute π!
ποΈ Speak Reflect Improve
Speak naturally and let our AI analyze your English proficiency across all areas
Check out my project's snapshots here:-


Check it out here(live version) :- Speak Reflect Improve
Features
π€ Speech Analysis
- Pronunciation Assessment: Detailed analysis of pronunciation clarity, accent, and intonation
- Vocabulary Evaluation: Assessment of vocabulary range, sophistication, and appropriateness
- Grammar Analysis: Grammar accuracy, sentence structure, and complexity evaluation
- Fluency Assessment: Speech flow, hesitation patterns, and filler word detection
- Coherence & Organization: Logical flow and idea connection analysis
- Idioms & Phrases: Natural expression and idiomatic usage evaluation
π Proficiency Grading
- CEFR Level Assessment: A1, A2, B1, B2, C1, C2 level determination
- Overall Grade: 1-10 scoring system with detailed explanations
- Actionable Feedback: Specific improvement recommendations
π€ AI-Powered Coaching
- Real-time Feedback: Immediate corrections and suggestions
- Personalized Tips: Customized improvement strategies
- Cultural Context: Natural expressions andβ¦
βοΈ Tech Stack
Technology | Purpose |
---|---|
Flask | Python backend for routing and logic |
AssemblyAI | Real-time streaming speech-to-text |
Groq API | Fast LLM-based English analysis |
JavaScript | UI interactions and audio logic |
HTML/CSS | Responsive frontend with dark theme |
Web APIs | Microphone access via MediaRecorder |
HTTPS | Required for audio recording in browser |
π§ Technical Implementation & AssemblyAI Integration
The heart of the app lies in real-time transcription using AssemblyAIβs Universal-Streaming API, and analysis using Groqβs Llama 3 model.
Here are the code snippets demonstrating AssemblyAI integration in the English Fluency Coach project:
π― 1. AssemblyAI Initialization & Configuration
# utils/voice_manager.py - AssemblyAI Setup import assemblyai as aai class VoiceManager: def __init__(self, api_keys: Dict[str, str]): self.api_keys = api_keys self.assemblyai_available = False self._init_assemblyai() def _init_assemblyai(self): if ASSEMBLYAI_AVAILABLE and self.api_keys.get('ASSEMBLYAI_API_KEY'): try: aai.settings.api_key = self.api_keys['ASSEMBLYAI_API_KEY'] test_config = aai.TranscriptionConfig( language_detection=True, punctuate=True, format_text=True, speaker_labels=False, auto_highlights=False ) self.assemblyai_available = True print("β
AssemblyAI initialized successfully") except Exception as e: print(f"β AssemblyAI initialization failed: {e}") self.assemblyai_available = False
This code initializes AssemblyAI SDK with optimized configuration for English fluency assessment. It sets
up language detection for universal support, enables punctuation and text formatting for professional analysis, and configures
single-speaker optimization. The initialization includes comprehensive error handling and status reporting, ensuring reliable setup for speech-to-text processing in the fluency coaching platform.
π€ 2. Core Audio Transcription with Dual-Mode Approach
# utils/voice_manager.py - Main Transcription Function def transcribe_audio(self, audio_file_path: str) -> str: if not os.path.exists(audio_file_path): return "β Audio file not found" if self.assemblyai_available: try: print("π Trying AssemblyAI SDK...") config = aai.TranscriptionConfig( language_detection=True, punctuate=True, format_text=True, speaker_labels=False, auto_highlights=False ) transcriber = aai.Transcriber(config=config) transcript = transcriber.transcribe(audio_file_path) if transcript.status == "completed": print("β
AssemblyAI SDK transcription successful") return self._clean_transcription(transcript.text) elif transcript.status == "error": print(f"β AssemblyAI SDK error: {transcript.error}") return f"β Transcription error: {transcript.error}" except Exception as e: print(f"β AssemblyAI SDK error: {e}") if self.api_keys.get('ASSEMBLYAI_API_KEY'): try: print("π Trying AssemblyAI Direct API...") result = self._transcribe_with_api(audio_file_path) if result and not result.startswith("β"): print("β
AssemblyAI API transcription successful") return self._clean_transcription(result) except Exception as e: print(f"β AssemblyAI API error: {e}") return "β Transcription failed. Please check API configuration."
This function implements a dual-mode transcription approach using both AssemblyAI SDK and direct API calls for maximum reliability. The primary method uses the SDK with enhanced configuration optimized for English fluency assessment, including language detection and professional formatting. If the SDK fails, it automatically falls back to direct API calls, ensuring consistent transcription availability for the fluency coaching platform.
π§ 3. Direct API Implementation with Enhanced Features
# utils/voice_manager.py - Direct API Implementation def _transcribe_with_api(self, audio_file_path: str) -> str: try: headers = {'authorization': self.api_keys['ASSEMBLYAI_API_KEY']} print("π€ Uploading audio file...") with open(audio_file_path, 'rb') as f: response = requests.post( 'https://api.assemblyai.com/v2/upload', headers=headers, files={'file': f}, timeout=60 ) if response.status_code != 200: return f"β Upload failed: {response.status_code} - {response.text}" upload_url = response.json()['upload_url'] print(f"β
File uploaded: {upload_url}") print("π Requesting transcription...") data = { 'audio_url': upload_url, 'language_detection': True, 'punctuate': True, 'format_text': True, 'speaker_labels': False, 'auto_highlights': False } response = requests.post( 'https://api.assemblyai.com/v2/transcript', headers=headers, json=data, timeout=30 ) if response.status_code != 200: return f"β Transcription request failed: {response.status_code}" transcript_id = response.json()['id'] print(f"π Transcription ID: {transcript_id}") print("β³ Waiting for transcription to complete...") max_attempts = 60 # 2-minute timeout attempt = 0 while attempt < max_attempts: response = requests.get( f'https://api.assemblyai.com/v2/transcript/{transcript_id}', headers=headers, timeout=30 ) if response.status_code != 200: return f"β Status check failed: {response.status_code}" result = response.json() status = result['status'] if status == 'completed': print("β
Transcription completed") return result['text'] or "β No text in transcription result" elif status == 'error': error_msg = result.get('error', 'Unknown error') return f"β Transcription error: {error_msg}" elif status in ['queued', 'processing']: print(f"β³ Status: {status} (attempt {attempt + 1}/{max_attempts})") import time time.sleep(2) # 2-second polling interval attempt += 1 else: return f"β Unknown status: {status}" return "β Transcription timeout - took too long to process" except requests.exceptions.Timeout: return "β Request timeout - please try again" except Exception as e: return f"β Unexpected error: {str(e)}"
This function implements direct AssemblyAI API calls as a fallback mechanism. It handles the complete
transcription workflow: file upload, transcription request with enhanced features, and intelligent polling with timeout. The
implementation includes comprehensive error handling for network issues, API failures, and timeouts. It uses a 2-second polling
interval with a 2-minute maximum timeout, ensuring reliable transcription processing for English fluency assessment.
π§Ή 4. Advanced Text Processing & Validation
def _clean_transcription(self, text: str) -> str: if not text: return "β Empty transcription result" text = text.strip() text = re.sub(r'\s+', ' ', text) text = re.sub(r'([.!?])\s*([a-z])', lambda m: m.group(1) + ' ' + m.group(2).upper(), text) if text and not text[0].isupper(): text = text[0].upper() + text[1:] if text and text[-1] not in '.!?': text += '.' return text def validate_audio_file(self, file_path: str) -> Dict[str, any]: if not os.path.exists(file_path): return { 'valid': False, 'error': 'File does not exist', 'file_size': 0 } file_size = os.path.getsize(file_path) max_size = 100 * 1024 * 1024 # 100MB AssemblyAI limit if file_size > max_size: return { 'valid': False, 'error': f'File too large: {file_size / (1024*1024):.1f}MB (max 100MB)', 'file_size': file_size } if file_size < 1000: # Minimum viable audio size return { 'valid': False, 'error': 'File too small - may be empty or corrupted', 'file_size': file_size } return { 'valid': True, 'error': None, 'file_size': file_size, 'file_size_mb': file_size / (1024 * 1024) }
This code handles post-transcription text processing and audio file validation. The cleaning function
normalizes whitespace, fixes capitalization, and ensures proper sentence structure for accurate fluency analysis. The validation
function checks file existence, size limits (100MB AssemblyAI maximum), and minimum viable size to prevent processing empty or
corrupted files. This ensures high-quality input for the English fluency assessment system.
π 5. Flask Integration & Speech Analysis Endpoint
@app.route('/analyze_speech', methods=['POST']) def analyze_speech(): try: audio_file = request.files.get('audio') assessment_type = request.form.get('type', 'general') if not audio_file: return jsonify({'error': 'No audio file provided'}), 400 session_id = session.get('session_id', datetime.now().strftime("%Y%m%d_%H%M%S")) session['session_id'] = session_id temp_path = f"temp/audio_{session_id}_{assessment_type}.wav" os.makedirs('temp', exist_ok=True) audio_file.save(temp_path) print(f"π Starting speech analysis of {temp_path}") transcription = voice_manager.transcribe_audio(temp_path) if transcription.startswith("β"): return jsonify({'error': transcription}), 500 analysis = english_analyzer.analyze_speech_proficiency( transcription=transcription, audio_file_path=temp_path, assessment_type=assessment_type ) if os.path.exists(temp_path): os.remove(temp_path) session['last_analysis'] = analysis session['last_transcription'] = transcription print(f"β
Speech analysis completed: Grade {analysis.get('overall_grade', 'N/A')}") return jsonify({ 'success': True, 'transcription': transcription, 'analysis': analysis }) except Exception as e: print(f"β Speech analysis error: {e}") return jsonify({'error': f'Speech analysis failed: {str(e)}'}), 500
This Flask endpoint orchestrates the complete speech analysis workflow. It handles audio file uploads,
creates session-based temporary files, processes transcription through AssemblyAI, performs comprehensive English proficiency
analysis, and manages cleanup. The endpoint includes robust error handling, session management for user tracking, and returns
detailed analysis results including transcription and proficiency assessment for the English fluency coaching platform.
π 6. AI-Powered English Proficiency Analysis
def analyze_speech_proficiency(self, transcription: str, audio_file_path: str = None,assessment_type: str = 'general') -> Dict: try: audio_analysis = self._analyze_audio_characteristics(audio_file_path) if audio_file_path else {} text_analysis = self._analyze_text_proficiency(transcription, assessment_type) combined_analysis = self._combine_analyses(text_analysis, audio_analysis, transcription) return combined_analysis except Exception as e: print(f"β Speech proficiency analysis error: {e}") raise def _create_proficiency_prompt(self, transcription: str, assessment_type: str) -> str: return f""" You are a world-renowned English language proficiency expert and certified TESOL instructor with 20+ years of experience. You are known for providing detailed, professional analysis like a premium English tutor. ASSESSMENT TYPE: {assessment_type} TRANSCRIPTION TO ANALYZE: "{transcription}" Provide a comprehensive professional English assessment following this EXACT format: ## PRONUNCIATION ANALYSIS [Detailed analysis of pronunciation quality, clarity, accent, stress patterns, intonation, and specific sounds. Be specific about what sounds good and what needs improvement.] ## VOCABULARY ASSESSMENT [Analyze vocabulary range, sophistication, word choice appropriateness, and lexical diversity.] - Advanced words used: [list specific advanced words they used] - Vocabulary level: [beginner/intermediate/advanced with explanation] - Suggested word upgrades: [specific examples like "good β excellent"] ## GRAMMAR EVALUATION [Assess grammar accuracy, sentence structure complexity, tense usage, and error patterns.] - Grammar strengths: [specific examples of correct usage] - Areas for improvement: [specific grammar points to work on] - Sentence complexity: [analysis of their sentence structures] ## FLUENCY ANALYSIS [Evaluate speech flow, hesitations, filler words, pace, natural rhythm, and speaking rate] - Filler words detected: [count and list them] - Speaking rate assessment: [words per minute if calculable] - Flow quality: [detailed assessment] ## COHERENCE & ORGANIZATION [Assess logical flow, idea connection, topic development, clarity of expression] ## CEFR LEVEL ASSESSMENT CEFR Level: [A1, A2, B1, B2, C1, or C2] Level Description: [Beginner/Elementary/Intermediate/Upper-Intermediate/Advanced/Proficient] ## DETAILED PROFESSIONAL FEEDBACK [Provide encouraging, detailed feedback like a professional English tutor. Be specific about their current abilities and growth potential.] ## PRIORITY FOCUS AREAS [List the top 2-3 areas they should focus on immediately for maximum improvement] Be detailed, professional, encouraging, and specific. Provide the kind of analysis a student would get from a premium English tutor. """
This code implements comprehensive English proficiency analysis using AI. It combines transcription analysis with audio characteristics to provide detailed assessment across pronunciation, vocabulary, grammar, fluency, and coherence. The system generates professional-grade feedback similar to premium English tutoring, including CEFR level assessment, specific improvement recommendations, and priority focus areas. The analysis prompt is designed to extract detailed, actionable insights for English language learners.
π― 7. Personalized Fluency Coaching System
def provide_coaching(self, user_input: str, practice_type: str, topic: str, user_level: str) -> Dict: try: coaching_prompt = self._create_coaching_prompt(user_input, practice_type, topic, user_level) coaching_response = self._get_ai_response(coaching_prompt) parsed_coaching = self._parse_coaching_response(coaching_response) return parsed_coaching except Exception as e: print(f"β Coaching error: {e}") return self._generate_fallback_coaching(user_input, practice_type) def _create_coaching_prompt(self, user_input: str, practice_type: str, topic: str, user_level: str) -> str: return f""" You are an expert English fluency coach with 15+ years of experience helping students improve their speaking skills. You specialize in providing constructive, encouraging, and actionable feedback. PRACTICE TYPE: {practice_type} TOPIC: {topic} USER LEVEL: {user_level} USER INPUT: "{user_input}" Provide comprehensive coaching feedback following this EXACT format: ## IMMEDIATE FEEDBACK [Provide immediate positive reinforcement and acknowledgment of their effort] ## CORRECTIONS [List specific corrections needed with explanations] - Original: [what they said] - Corrected: [how it should be said] - Explanation: [why this correction is needed] ## PRONUNCIATION NOTES [Specific pronunciation feedback and tips] ## VOCABULARY ENHANCEMENT [Suggest better word choices or more advanced vocabulary] - Instead of: [basic word/phrase] - Try using: [advanced alternative] - Example: [sentence using the advanced word] ## GRAMMAR IMPROVEMENTS [Point out grammar issues and provide corrections] ## FLUENCY TIPS [Specific tips to improve natural flow and reduce hesitations] ## CULTURAL CONTEXT [Explain any cultural nuances or more natural expressions] ## PRACTICE SUGGESTION [Specific practice exercise based on their performance] ## ENCOURAGEMENT [Motivational message highlighting their progress and strengths] Provide specific, actionable feedback that helps them improve their English fluency naturally and confidently. """ def generate_practice_prompt(self, practice_type: str, topic: str, level: str) -> Dict: """ Generate practice prompts for different types of exercises """ prompts = { 'conversation': self._get_conversation_prompts(topic, level), 'pronunciation': self._get_pronunciation_prompts(level), 'vocabulary': self._get_vocabulary_prompts(topic, level), 'storytelling': self._get_storytelling_prompts(topic, level), 'debate': self._get_debate_prompts(topic, level), 'presentation': self._get_presentation_prompts(topic, level), 'song_analysis': self._get_song_prompts(level) } return prompts.get(practice_type, prompts['conversation'])
This code implements personalized AI-powered fluency coaching that provides specific, actionable feedback based on user input, practice type, and proficiency level. The system offers corrections, pronunciation notes, vocabulary enhancements, grammar improvements, and cultural context. It generates practice prompts for seven different exercise types (conversation, pronunciation, vocabulary, storytelling, debate, presentation, song analysis) and adapts coaching style to user level,providing encouraging yet constructive feedback to improve English fluency naturally.
π§ͺ Usage Guide
- Go to homepage β Start Assessment
- Choose one of these assessments:
- Quick (2 min)
- Deep Dive (5β7 min)
- Topic-Based
- Speak naturally. AI listens, transcribes, and analyzes.
- Instantly see:
- Fluency score (1β10)
- CEFR Level (A1βC2)
- Feedback
- Personalized tips
π Interpreting Results
-
Score
- 1β3: Basic, needs improvement
- 4β6: Intermediate
- 7β8: Advanced
- 9β10: Pro-level
-
CEFR
- A1βA2: Elementary
- B1βB2: Competent user
- C1βC2: Expert mastery
β¨ Final Words
This isnβt just a hackathon project.
This is me, learning to speak.
This is you, breaking silence.
This is us, making the world more articulate, one spoken word at a time.
As a shy, loner coder who barely talks in real life, I built this because I had to. Not for winning. But for surviving. For growing.
And if this little voice agent - built with code, hope, and self-healing, helps even one more person speak better, itβs already a success.
Vote for Speak Reflect Improve if you believe in the quiet ones too - because sometimes, they build the loudest revolutions.
π«Ά With love,
Divya ππ
Top comments (6)
I liked this project!
Formatting issue π
Thanks for pointing it out
No issues
Awesome explanation
Remove the last line of blog
thank you π
Some comments may only be visible to logged-in visitors. Sign in to view all comments.