This is part of my journey building the Kai ecosystem—a fully local, offline-first voice assistant that keeps your data yours.
Well, I started building an app for myself first.
I collaborated with Claude to build layered time parsing logic all through natural language and my goal is to see a functional app that does what it is designed for.
Kai Lite: 5-Point Summary
- Privacy-first voice assistant - Complete offline functionality, zero cloud data sharing, all data stays on your device
- Natural voice commands - Add reminders, create memos, check calendar using speech-to-text with pattern-based parsing
- Local-first architecture - Flutter mobile app with SQLite storage, works in airplane mode, no internet required
- User data control - Export/delete everything anytime, transparent permissions, visual indicators when mic is active
- Future ecosystem foundation - Designed to sync with Kai Laptop/Desktop while maintaining privacy and user control
This week, I'm sharing what actually happened when I tried to build a voice agent that works completely offline. Turns out, it is harder than expected for native AI builders.
App Demo
My AI Collaborator This Week
Claude: My main implementation partner throughout this build. From initial architecture decisions to debugging regex patterns, Claude helped me think through each technical challenge and iterate quickly on solutions.
What I Actually Built (The Messy Reality)
Attempt 1: "Let's Build Alexa-Level Voice Commands"
The goal was ambitious: voice commands that work as smoothly as Alexa, but completely local.
Started with the standard Flutter voice setup:
dependencies: speech_to_text: ^6.3.0 flutter_tts: ^3.8.3 permission_handler: ^11.0.1 Basic voice service structure:
class VoiceService { final SpeechToText _speech = SpeechToText(); final FlutterTts _tts = FlutterTts(); Future<void> initialize() async { await _speech.initialize(); // Kai's calm voice settings await _tts.setSpeechRate(0.9); await _tts.setPitch(1.0); } } The reality check:
Spent a day testing and realized that even with onDevice: true, the accuracy wasn't consistent enough for the "Alexa-level" experience I wanted.
Result: Needed a completely different approach.
Attempt 2: Comprehensive Pattern-Based Parser (What Actually Works)
Claude suggested focusing on pattern-based parsing instead of trying to build mini-Alexa.
Smart advice—I used AI to help design the VoiceCommandParser architecture and generate comprehensive regex patterns for different ways people naturally speak.
class VoiceCommandParser { static final Map<String, List<RegExp>> patterns = { 'calendar_add': [ RegExp(r'remind me to (.*?) at (.*)'), RegExp(r'add (.*?) to calendar at (.*)'), RegExp(r'schedule (.*?) for (.*)'), RegExp(r'set reminder (.*?) at (.*)'), RegExp(r'(.*?) at (.*?) today'), RegExp(r'(.*?) at (.*?) tomorrow'), ], 'calendar_check': [ RegExp(r"what'?s on my calendar\??"), RegExp(r"what do i have today\??"), RegExp(r"show my schedule"), RegExp(r"any events today\??"), ], 'memo_add': [ RegExp(r'note to self[,:]? (.*)'), RegExp(r'remember that (.*)'), RegExp(r'make a note[,:]? (.*)'), RegExp(r'write down (.*)'), ], }; static VoiceCommand parse(String input) { input = input.toLowerCase().trim(); // Check each pattern category for (final entry in patterns.entries) { final intent = entry.key; final patternList = entry.value; for (final pattern in patternList) { final match = pattern.firstMatch(input); if (match != null) { return _extractCommand(intent, input, match); } } } // Fuzzy matching fallback return _fuzzyMatch(input); } } Added smart time parsing that handles natural language:
static String? _parseTime(String timeStr) { // Natural language conversions final conversions = { 'morning': '9:00 AM', 'afternoon': '2:00 PM', 'evening': '6:00 PM', 'night': '9:00 PM', 'noon': '12:00 PM', 'midnight': '12:00 AM', }; // Check natural language first for (final entry in conversions.entries) { if (timeStr.contains(entry.key)) { return entry.value; } } // Parse actual times (3pm, 3:30pm, 15:00) final timeMatch = RegExp(r'(\d{1,2})(?::(\d{2}))?\s*(am|pm)?', caseSensitive: false).firstMatch(timeStr); if (timeMatch != null) { var hour = int.parse(timeMatch.group(1) ?? '0'); final minute = timeMatch.group(2) ?? '00'; var ampm = timeMatch.group(3)?.toUpperCase(); // Smart guessing for ambiguous times if (ampm == null) { if (hour >= 7 && hour <= 11) { ampm = 'AM'; } else if (hour >= 1 && hour <= 6) { ampm = 'PM'; } else if (hour >= 13 && hour <= 23) { hour = hour - 12; ampm = 'PM'; } } return '${hour}:${minute} ${ampm}'; } return null; } Multi-turn conversation handler for missing information:
class ConversationHandler { ConversationContext _context = ConversationContext(); Future<void> handleCommand(String input) async { final command = VoiceCommandParser.parse(input); if (command.confidence < 0.7) { await _voice.speak("I'm not sure. Did you want to add a calendar event or create a memo?"); return; } // Handle missing information if (command.intent == 'calendar_add') { if (command.title == null) { _context.state = ConversationState.waitingForTitle; await _voice.speak("What would you like me to remind you about?"); return; } if (command.time == null) { _context.state = ConversationState.waitingForTime; await _voice.speak("What time should I set the reminder for?"); return; } await _createCalendarEvent(command); } } } Performance after this approach:
- Recognition accuracy: 90% for supported patterns
- Response time: <300ms end-to-end
- Memory usage: 45MB while active
- Battery impact: <2% over full day of testing
Real example that works:
User: "Remind me to call mom tomorrow at three"
↓
STT: "remind me to call mom tomorrow at three"
↓
Pattern match: RegExp(r'remind me to (.?) at (.)')
↓
Extract: title="call mom tomorrow", time="three"
↓
Time parsing: "three" → "3:00 PM" (afternoon guess)
↓
Date parsing: "tomorrow" → DateTime.now().add(Duration(days: 1))
↓
Create task in SQLite
↓
TTS: "Perfect! I've added 'call mom' for 3 PM tomorrow"
Attempt 3: The Complete Alexa-Level System
Realized I was thinking about this wrong. Instead of trying to match Alexa, I built something simpler that works reliably.
My actual architecture:
// 1. Local STT with better settings await _speech.listen( onDevice: true, listenFor: Duration(seconds: 3), // Shorter timeout cancelOnError: true, partialResults: false // Wait for complete result ); // 2. Pattern-based parsing with multiple variations static VoiceCommand parse(String input) { input = input.toLowerCase().trim(); // Check each pattern category for (final entry in patterns.entries) { final intent = entry.key; final patternList = entry.value; for (final pattern in patternList) { final match = pattern.firstMatch(input); if (match != null) { return _extractCommand(intent, input, match); } } } return VoiceCommand(intent: 'unknown'); } // 3. Smart time parsing static String? _parseTime(String timeStr) { final conversions = { 'morning': '9:00 AM', 'afternoon': '2:00 PM', 'evening': '6:00 PM', 'noon': '12:00 PM', }; // Handle natural language first for (final entry in conversions.entries) { if (timeStr.contains(entry.key)) { return entry.value; } } // Then handle actual times like "3pm" or "3:30" final timeMatch = RegExp(r'(\d{1,2})(?::(\d{2}))?\s*(am|pm)?') .firstMatch(timeStr); // ... parsing logic } Real example of what works:
User says: "Remind me to call mom at three"
↓
Local STT: "remind me to call mom at three"
↓
Pattern match: RegExp(r'remind me to (.?) at (.)')
↓
Extract: title="call mom", time="three"
↓
Parse time: "three" → "3:00 PM" (smart guess for afternoon)
↓
Create task in SQLite
↓
Response: "Added 'call mom' for 3:00 PM today"
Performance after optimization:
- Recognition time: 200-400ms
- Memory usage: 40MB while active
- Accuracy: 85% for supported commands
- Battery impact: <2% over full day
The Privacy Architecture I Actually Built
Problem: How do you prove to users that nothing leaves their phone?
My solution - complete transparency:
1. Visual indicators everywhere:
// Kai bubble pulses when listening AnimatedContainer( duration: Duration(milliseconds: 300), decoration: BoxDecoration( color: _isListening ? Color(0xFF9C7BD9).withOpacity(0.8) // Active purple : Color(0xFF9C7BD9).withOpacity(0.2), // Calm purple shape: BoxShape.circle, ), ) 2. Data export built in from day 1:
class DataExportService { Future<String> exportAllUserData() async { final tasks = await CalendarService().getAllTasks(); final memos = await MemoService().getAllMemos(); return jsonEncode({ 'export_date': DateTime.now().toIso8601String(), 'tasks': tasks.map((t) => t.toMap()).toList(), 'memos': memos.map((m) => m.toMap()).toList(), }); } } 3. One-tap delete everything:
Future<void> deleteAllUserData() async { await CalendarService().clearAllTasks(); await MemoService().clearAllMemos(); await SharedPreferences.getInstance().then((prefs) => prefs.clear()); // Show confirmation: "All data deleted" } What surprised me: In testing, I/user cared more about seeing the "Export my data" and "Delete everything" buttons than perfect voice accuracy. Just knowing I had control felt satisfying.
Database Design That Actually Works Offline
Used SQLite with sync-ready fields from the start:
class Task { final String id; final String title; final DateTime? date; final String? time; final bool isCompleted; // Sync-ready fields for future final DateTime lastModified; final String sourceDevice; final String status; // 'active' | 'deleted' Task({ required this.id, required this.title, this.date, this.time, this.isCompleted = false, required this.lastModified, this.sourceDevice = 'kai-lite-android', this.status = 'active', }); } Why this works:
- Everything works offline immediately
- Sync fields ready for when I build cross-device features
- Soft deletes mean data recovery is possible
- Device tracking for multi-device scenarios
Performance Debugging (The Fun Stuff)
Issue 1: Memory leaks during voice processing
// Problem: Not disposing speech service @override void dispose() { _speech.stop(); // Added this _speech.cancel(); // And this super.dispose(); } Issue 2: Battery drain from overlay
// Problem: Overlay always active // Solution: Smart hiding void _hideOverlayDuringCalls() { if (_phoneStateService.isInCall()) { _overlay.hide(); } } Issue 3: SQLite performance with 1000+ tasks
// Added indexing for date queries await db.execute(''' CREATE INDEX IF NOT EXISTS idx_task_date_status ON tasks(date, status) '''); What I Learned (Technical & Otherwise)
Technical insights:
- SQLite performs way better than expected on mobile
- Local speech processing is viable if you optimize for specific use cases
- Pattern matching beats AI models for simple command parsing
- Flutter overlays are battery killers if not managed properly
UX insights:
Privacy needs to feel empowering, not defensive Visual feedback builds more trust than explanations Reliable simple commands can feel smoother overall than unreliable complex ones Architecture insights:
Build offline-first from day 1, add sync later
Start with the simplest solution that could work
Real user testing catches issues you never thought of
The Current State
What actually ships:
- 15+ voice command patterns that work reliably
- Complete offline functionality (no internet required)
- Export/delete controls for full data ownership
- <300ms voice response time

Top comments (0)