Show HN: Python Audio Transcription: Convert Speech to Text Locally

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com
featured
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video.
Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
getstream.io
featured
  1. whisper-standalone-win

    Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.

    I like this version of Whisper which has diarization built in: https://github.com/Purfview/whisper-standalone-win

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. senko

    Very fast, accurate speaker diarization

    - https://github.com/narcotic-sh/senko

    I personally love senko since it can run in seconds, whereas py-annote took hours, but there is a 10% WER (word error rate) that is tough to get around.

  4. Kit-Whisperx

    This project allows local installation and use of WhisperX WebUI, an advanced audio transcription system based on OpenAI's Whisper but optimized to run on local hardware with or without GPU.

    I always thought this was a great implementation if you have a Cuda layer: https://github.com/rgcodeai/Kit-Whisperx

    I had an old Acer laptop hanging around, so I implemented this: https://github.com/Sanborn-Young/MP3ToTXT

    I forget all the details of my tweaks, but I remember that I had better throughput on my version.

    I know the OP talked about wanting it local, but thomasmol/whisper-diarization on replicate is fast and cheap. Here's a hacked front end to parse teh JSON: https://github.com/Sanborn-Young/MP3_2transcript

  5. MP3ToTXT

    A Hacky version of WhisperX running on CPU that is fast and gives 5 minute time stamps

    I always thought this was a great implementation if you have a Cuda layer: https://github.com/rgcodeai/Kit-Whisperx

    I had an old Acer laptop hanging around, so I implemented this: https://github.com/Sanborn-Young/MP3ToTXT

    I forget all the details of my tweaks, but I remember that I had better throughput on my version.

    I know the OP talked about wanting it local, but thomasmol/whisper-diarization on replicate is fast and cheap. Here's a hacked front end to parse teh JSON: https://github.com/Sanborn-Young/MP3_2transcript

  6. MP3_2transcript

    GUI front-end for thomasmol/whisper-diarization

    I always thought this was a great implementation if you have a Cuda layer: https://github.com/rgcodeai/Kit-Whisperx

    I had an old Acer laptop hanging around, so I implemented this: https://github.com/Sanborn-Young/MP3ToTXT

    I forget all the details of my tweaks, but I remember that I had better throughput on my version.

    I know the OP talked about wanting it local, but thomasmol/whisper-diarization on replicate is fast and cheap. Here's a hacked front end to parse teh JSON: https://github.com/Sanborn-Young/MP3_2transcript

  7. hns

    hns is a speech-to-text CLI tool to transcribe your voice from your microphone directly to clipboard. Integrate hns with Claude Code, Ollama, LLM, and more CLI tools for powerful workflows.

    btw, if you want local dictation, speak and get a transcript, not transcribe files, I built a Python tool called hns [1]. It's open source, uses faster-whisper, and you can run it with `uvx hns` or just `hns` after `uv tool install hns`.

    [1]: https://github.com/primaprashant/hns

  8. speechshift

    A fully local, offline first speech-to-text application made for Linux!

    Since the past two days I've been working on SpeechShift [1], its a fully local, offline first, speech to text utility that allows you to trigger it with a command, transcribes with whisper and puts pastes it in the window you are currently focused on (like chrome, typora or some other window). Basically SuperWhisper [2] but for linux. (If this is something which interests you & check it out! Feel free to ping me if something does not work as expected.)

    I've been trying to squeeze out performance out of whisper, but felt (at least for non native speakers) the base model does a good job. In terms of pre processing I do VAD & some normalization. But on my rusty thinkpad the processing time is way too long. I'll try some of the forementioned tips and see if the accuracy & perf can get any better. I'm documenting my learnings over at my notes [3].

    [1] https://github.com/BharatKalluri/speechshift

  9. Stream

    Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.

    Stream logo
  10. noScribe

    Cutting edge AI technology for automated audio transcription. A nice GUI for OpenAIs Whisper and pyannote (speaker identification)

    There's a GUI on top of whisper that is very handy for editing, as you can listen to the sentences: https://github.com/kaixxx/noScribe

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Self-hosted offline transcription and diarization service with LLM summary

    5 projects | news.ycombinator.com | 26 May 2024
  • Best Speech-to-text API with speaker diarization?

    1 project | news.ycombinator.com | 6 May 2024
  • Summarization of long transcriptions

    1 project | /r/LocalLLaMA | 18 Jul 2023
  • Our New Sam Audio Model Transforms Audio Editing

    3 projects | news.ycombinator.com | 23 Dec 2025
  • Making AI Models Faster, Cheaper, and Greener — Here’s How

    5 projects | dev.to | 3 Nov 2025

Did you know that Python is
the 2nd most popular programming language
based on number of references?