-
whisper-standalone-win
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
I like this version of Whisper which has diarization built in: https://github.com/Purfview/whisper-standalone-win
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
- https://github.com/narcotic-sh/senko
I personally love senko since it can run in seconds, whereas py-annote took hours, but there is a 10% WER (word error rate) that is tough to get around.
-
Kit-Whisperx
This project allows local installation and use of WhisperX WebUI, an advanced audio transcription system based on OpenAI's Whisper but optimized to run on local hardware with or without GPU.
I always thought this was a great implementation if you have a Cuda layer: https://github.com/rgcodeai/Kit-Whisperx
I had an old Acer laptop hanging around, so I implemented this: https://github.com/Sanborn-Young/MP3ToTXT
I forget all the details of my tweaks, but I remember that I had better throughput on my version.
I know the OP talked about wanting it local, but thomasmol/whisper-diarization on replicate is fast and cheap. Here's a hacked front end to parse teh JSON: https://github.com/Sanborn-Young/MP3_2transcript
-
I always thought this was a great implementation if you have a Cuda layer: https://github.com/rgcodeai/Kit-Whisperx
I had an old Acer laptop hanging around, so I implemented this: https://github.com/Sanborn-Young/MP3ToTXT
I forget all the details of my tweaks, but I remember that I had better throughput on my version.
I know the OP talked about wanting it local, but thomasmol/whisper-diarization on replicate is fast and cheap. Here's a hacked front end to parse teh JSON: https://github.com/Sanborn-Young/MP3_2transcript
-
I always thought this was a great implementation if you have a Cuda layer: https://github.com/rgcodeai/Kit-Whisperx
I had an old Acer laptop hanging around, so I implemented this: https://github.com/Sanborn-Young/MP3ToTXT
I forget all the details of my tweaks, but I remember that I had better throughput on my version.
I know the OP talked about wanting it local, but thomasmol/whisper-diarization on replicate is fast and cheap. Here's a hacked front end to parse teh JSON: https://github.com/Sanborn-Young/MP3_2transcript
-
hns
hns is a speech-to-text CLI tool to transcribe your voice from your microphone directly to clipboard. Integrate hns with Claude Code, Ollama, LLM, and more CLI tools for powerful workflows.
btw, if you want local dictation, speak and get a transcript, not transcribe files, I built a Python tool called hns [1]. It's open source, uses faster-whisper, and you can run it with `uvx hns` or just `hns` after `uv tool install hns`.
[1]: https://github.com/primaprashant/hns
-
Since the past two days I've been working on SpeechShift [1], its a fully local, offline first, speech to text utility that allows you to trigger it with a command, transcribes with whisper and puts pastes it in the window you are currently focused on (like chrome, typora or some other window). Basically SuperWhisper [2] but for linux. (If this is something which interests you & check it out! Feel free to ping me if something does not work as expected.)
I've been trying to squeeze out performance out of whisper, but felt (at least for non native speakers) the base model does a good job. In terms of pre processing I do VAD & some normalization. But on my rusty thinkpad the processing time is way too long. I'll try some of the forementioned tips and see if the accuracy & perf can get any better. I'm documenting my learnings over at my notes [3].
[1] https://github.com/BharatKalluri/speechshift
-
Stream
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
-
noScribe
Cutting edge AI technology for automated audio transcription. A nice GUI for OpenAIs Whisper and pyannote (speaker identification)
There's a GUI on top of whisper that is very handy for editing, as you can listen to the sentences: https://github.com/kaixxx/noScribe