Python Speech

Open-source Python projects categorized as Speech

Top 23 Python Speech Projects

  1. TTS

    🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

    Project mention: AI Twin — Voice Cloning with Text-to-Speech | dev.to | 2025-12-16

    Coqui TTS - The amazing text-to-speech library that powers this project

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. MockingBird

    🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time

  4. datasets

    🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

    Project mention: Training with Big Data on Any Cloud | dev.to | 2025-06-20

    Hugging Face Datasets -- the library that lets you download and manage datasets from the Hugging Face Hub, as well as being a convenient vendor-neutral interface for your own datasets.

  5. whisperX

    WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

    Project mention: Making AI Models Faster, Cheaper, and Greener — Here’s How | dev.to | 2025-11-03

    2.3X speed improvement over WhisperX and a 3X speed boost compared to HuggingFace Pipeline with FlashAttention 2 (Insanely Fast Whisper)

  6. AudioGPT

    AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

  7. modelscope

    ModelScope: bring the notion of Model-as-a-Service to life.

  8. EmotiVoice

    EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

  9. Stream

    Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.

    Stream logo
  10. silero-vad

    Silero VAD: pre-trained enterprise-grade Voice Activity Detector

    Project mention: 2025 Voice AI Guide: How to Make Your Own Real-Time Voice Agent (Part-1) | dev.to | 2025-09-20

    Silero VAD is the gold standard and pipecat has builtin support so I have choosen that :

  11. ultravox

    A fast multimodal LLM for real-time voice

    Project mention: I Open-Sourced My AI Toy Company That Runs on ESP32 and OpenAI Realtime API | news.ycombinator.com | 2025-04-22

    This looks like so much fun! I have recently gotten into working with electronics, so it seems like a nice little project to undertake.

    I noticed that it is dependent on openAIs realtime API, so it got me wondering what open alternatives there are.

    I could only find ultravox (https://github.com/fixie-ai/ultravox) that would seem to really work as realtime. It seems to be some model that wires up llama and whisper somehow, rather than treating them as separate steps which is common with other projects,

    What other options are available for this kind of real-time behaviour?

  12. speech-to-speech

    Speech To Speech: an effort for an open-sourced and modular GPT4-o

  13. metavoice-src

    Foundational model for human-like, expressive TTS

  14. DeepFilterNet

    Noise supression using deep filtering

    Project mention: Show HN: Background noise removal in multimedia with a single command | news.ycombinator.com | 2025-10-06
  15. whisper-asr-webservice

    OpenAI Whisper ASR Webservice API

  16. VoxCPM

    VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

    Project mention: VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and Voice Cloning | news.ycombinator.com | 2025-12-05
  17. lingvo

    Lingvo

  18. aeneas

    aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)

  19. whisper-timestamped

    Multilingual Automatic Speech Recognition with word-level timestamps and confidence

  20. gTTS

    Python library and CLI tool to interface with Google Translate's text-to-speech API

  21. IMS-Toucan

    Controllable and fast Text-to-Speech for over 7000 languages!

  22. openai-edge-tts

    Free, high-quality text-to-speech API endpoint to replace OpenAI, Azure, or ElevenLabs

    Project mention: Open source TTS by Resemble (claiming they are sota) | news.ycombinator.com | 2025-06-11

    It can definitely run on CPU — but I'm not sure if it can run on a machine without a GPU _entirely_.

    To be honest, it uses a decently large amount of resources. If you had a GPU, you could expect about 4-5 gb memory usage. And given the optimizations for tensors on GPUs, I'm not sure how well thinks would work "CPU only".

    If you try it, let me know. There are some "CPU" Docker builds in the repo you could look at for guidance.

    If you want free TTS without using local resources, you could try edge-tts https://github.com/travisvn/openai-edge-tts

  23. SALMONN

    SALMONN family: A suite of advanced multi-modal LLMs

  24. voicefixer

    General Speech Restoration

  25. StreamSpeech

    StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Speech discussion

Python Speech related posts

  • AI Twin — Voice Cloning with Text-to-Speech

    2 projects | dev.to | 16 Dec 2025
  • Making AI Models Faster, Cheaper, and Greener — Here’s How

    5 projects | dev.to | 3 Nov 2025
  • 2025 Voice AI Guide: How to Make Your Own Real-Time Voice Agent (Part-1)

    7 projects | dev.to | 20 Sep 2025
  • Ask HN: What Speaker Diarization tools should I look into?

    1 project | news.ycombinator.com | 23 Jul 2025
  • Training with Big Data on Any Cloud

    4 projects | dev.to | 20 Jun 2025
  • Show HN: Mikey – No bot meeting notetaker for Windows

    6 projects | news.ycombinator.com | 12 Feb 2025
  • Ask HN: Is Whisper Still Relevant?

    2 projects | news.ycombinator.com | 12 Feb 2025
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 22 Dec 2025
    InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →

Index

What are some of the best open-source Speech projects in Python? This list will help you:

# Project Stars
1 TTS 43,441
2 MockingBird 36,745
3 datasets 21,001
4 whisperX 19,239
5 AudioGPT 10,200
6 modelscope 8,579
7 EmotiVoice 8,367
8 silero-vad 7,669
9 ultravox 4,294
10 speech-to-speech 4,251
11 metavoice-src 4,191
12 DeepFilterNet 3,407
13 whisper-asr-webservice 3,070
14 VoxCPM 2,988
15 lingvo 2,856
16 aeneas 2,781
17 whisper-timestamped 2,700
18 gTTS 2,568
19 IMS-Toucan 2,065
20 openai-edge-tts 1,478
21 SALMONN 1,369
22 voicefixer 1,252
23 StreamSpeech 1,213

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?