Show HN: Python Audio Transcription: Convert Speech to Text Locally

InfluxDB – Built for High-Performance Time Series Workloads

InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

www.influxdata.com

featured

Stream - Scalable APIs for Chat, Feeds, Moderation, & Video.

Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.

getstream.io

featured

whisper-standalone-win

1 7 2,740 5.0 Python

Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.

I like this version of Whisper which has diarization built in: https://github.com/Purfview/whisper-standalone-win
InfluxDB

www.influxdata.com featured

InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
senko

2 3 191 8.6 Python

Very fast, accurate speaker diarization

- https://github.com/narcotic-sh/senko
I personally love senko since it can run in seconds, whereas py-annote took hours, but there is a 10% WER (word error rate) that is tough to get around.
Kit-Whisperx

3 1 21 6.6 Python

This project allows local installation and use of WhisperX WebUI, an advanced audio transcription system based on OpenAI's Whisper but optimized to run on local hardware with or without GPU.

I always thought this was a great implementation if you have a Cuda layer: https://github.com/rgcodeai/Kit-Whisperx
I had an old Acer laptop hanging around, so I implemented this: https://github.com/Sanborn-Young/MP3ToTXT
I forget all the details of my tweaks, but I remember that I had better throughput on my version.
I know the OP talked about wanting it local, but thomasmol/whisper-diarization on replicate is fast and cheap. Here's a hacked front end to parse teh JSON: https://github.com/Sanborn-Young/MP3_2transcript
MP3ToTXT

4 1 0 4.8 Python

A Hacky version of WhisperX running on CPU that is fast and gives 5 minute time stamps

I always thought this was a great implementation if you have a Cuda layer: https://github.com/rgcodeai/Kit-Whisperx
I had an old Acer laptop hanging around, so I implemented this: https://github.com/Sanborn-Young/MP3ToTXT
I forget all the details of my tweaks, but I remember that I had better throughput on my version.
I know the OP talked about wanting it local, but thomasmol/whisper-diarization on replicate is fast and cheap. Here's a hacked front end to parse teh JSON: https://github.com/Sanborn-Young/MP3_2transcript
MP3_2transcript

5 1 0 7.4 Python

GUI front-end for thomasmol/whisper-diarization

I always thought this was a great implementation if you have a Cuda layer: https://github.com/rgcodeai/Kit-Whisperx
I had an old Acer laptop hanging around, so I implemented this: https://github.com/Sanborn-Young/MP3ToTXT
I forget all the details of my tweaks, but I remember that I had better throughput on my version.
I know the OP talked about wanting it local, but thomasmol/whisper-diarization on replicate is fast and cheap. Here's a hacked front end to parse teh JSON: https://github.com/Sanborn-Young/MP3_2transcript
hns

6 6 61 7.6 Python

hns is a speech-to-text CLI tool to transcribe your voice from your microphone directly to clipboard. Integrate hns with Claude Code, Ollama, LLM, and more CLI tools for powerful workflows.

btw, if you want local dictation, speak and get a transcript, not transcribe files, I built a Python tool called hns [1]. It's open source, uses faster-whisper, and you can run it with `uvx hns` or just `hns` after `uv tool install hns`.
[1]: https://github.com/primaprashant/hns
speechshift

7 1 8 6.6 Python

A fully local, offline first speech-to-text application made for Linux!

Since the past two days I've been working on SpeechShift [1], its a fully local, offline first, speech to text utility that allows you to trigger it with a command, transcribes with whisper and puts pastes it in the window you are currently focused on (like chrome, typora or some other window). Basically SuperWhisper [2] but for linux. (If this is something which interests you & check it out! Feel free to ping me if something does not work as expected.)
I've been trying to squeeze out performance out of whisper, but felt (at least for non native speakers) the base model does a good job. In terms of pre processing I do VAD & some normalization. But on my rusty thinkpad the processing time is way too long. I'll try some of the forementioned tips and see if the accuracy & perf can get any better. I'm documenting my learnings over at my notes [3].
[1] https://github.com/BharatKalluri/speechshift
Stream

getstream.io featured

Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
noScribe

8 3 1,667 9.4 Python

Cutting edge AI technology for automated audio transcription. A nice GUI for OpenAIs Whisper and pyannote (speaker identification)

There's a GUI on top of whisper that is very handy for editing, as you can listen to the sentences: https://github.com/kaixxx/noScribe

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Self-hosted offline transcription and diarization service with LLM summary

5 projects | news.ycombinator.com | 26 May 2024
Best Speech-to-text API with speaker diarization?

1 project | news.ycombinator.com | 6 May 2024
Summarization of long transcriptions

1 project | /r/LocalLLaMA | 18 Jul 2023
Our New Sam Audio Model Transforms Audio Editing

3 projects | news.ycombinator.com | 23 Dec 2025
Making AI Models Faster, Cheaper, and Greener — Here’s How

5 projects | dev.to | 3 Nov 2025

Show HN: Python Audio Transcription: Convert Speech to Text Locally

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
hardware-buttons linkedin-bot template-engine-js
Post date: 22 Sep 2025

whisper-standalone-win

InfluxDB

senko

Kit-Whisperx

MP3ToTXT

MP3_2transcript

hns

speechshift

Stream

noScribe

Related posts

Self-hosted offline transcription and diarization service with LLM summary

Best Speech-to-text API with speaker diarization?

Summarization of long transcriptions

Our New Sam Audio Model Transforms Audio Editing

Making AI Models Faster, Cheaper, and Greener — Here’s How

Did you know that Python is
the 2nd most popular programming language
based on number of references?

Show HN: Python Audio Transcription: Convert Speech to Text Locally

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com hardware-buttons linkedin-bot template-engine-js Post date: 22 Sep 2025

Related posts

Self-hosted offline transcription and diarization service with LLM summary

Best Speech-to-text API with speaker diarization?

Summarization of long transcriptions

Our New Sam Audio Model Transforms Audio Editing

Making AI Models Faster, Cheaper, and Greener — Here’s How

Did you know that Python is the 2nd most popular programming language based on number of references?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
hardware-buttons linkedin-bot template-engine-js
Post date: 22 Sep 2025

Did you know that Python is
the 2nd most popular programming language
based on number of references?