Skip to content

Conversation

@woziii
Copy link

@woziii woziii commented Nov 6, 2024

  • Replace STT with LightningWhisperMLX Medium for Apple Silicon
  • Switch LLM to Llama-3.2-3B-Instruct-8bit MLX format
  • Update TTS to Melo TTS
  • Keep Silero VAD for voice detection

Performance:

  • Optimize latency (~4s end-to-end)
  • Focus on French-to-English translation
  • Add video call platforms support (Teams, Zoom, FaceTime)
  • Test & validate on M2 chip with 22GB RAM

Changes:

  • Modify system prompt for translation tasks
  • Remove CUDA components
  • Streamline pipeline for Apple Silicon
  • Add real-time processing optimizations

Tested on MacBook Air M2, compatible with major video call platforms except Google Meet.

Based on original speech-to-speech project, inspired by Andrés Marafioti's work.

- Replace STT with LightningWhisperMLX Medium for Apple Silicon - Switch LLM to Llama-3.2-3B-Instruct-8bit MLX format - Update TTS to Melo TTS - Keep Silero VAD for voice detection Performance: - Optimize latency (~4s end-to-end) - Focus on French-to-English translation - Add video call platforms support (Teams, Zoom, FaceTime) - Test & validate on M2 chip with 22GB RAM Changes: - Modify system prompt for translation tasks - Remove CUDA components - Streamline pipeline for Apple Silicon - Add real-time processing optimizations Tested on MacBook Air M2, compatible with major video call platforms except Google Meet. Based on original speech-to-speech project, inspired by Andrés Marafioti's work.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants