Show HN: Whispering – Open-source, local-first dictation you can trust
79 comments
·August 18, 2025wkcheng
Does this support using the Parakeet model locally? I'm a MacWhisper user and I find that Parakeet is way better and faster than Whisper for on-device transcription. I've been using push-to-transcribe with MacWhisper through Parakeet for a while now and it's quite magical.
braden-w
Not yet, but I want it too! Parakeet looks incredible (saw that leaderboard result). My current roadmap is: finish stabilizing whisper.cpp integration, then add Parakeet support. If anyone has bandwidth to PR the connector, I’d be thrilled to merge it.
Bolwin
Unfortunately, because it's Nvidia, parakeet doesn't work with Whisper.cpp as far as I'm aware. You need onnx
daemonologist
Parakeet is amazing - 3000x real-time on an A100 and 5x real-time even on a laptop CPU, while being more accurate than whisper-large-v3 (https://huggingface.co/spaces/hf-audio/open_asr_leaderboard). NeMo is a little awkward though; I'm amazed it runs locally on Mac (for MacWhisper).
wkcheng
Yeah, Parakeet runs great locally on my M1 laptop (through MacWhisper). Transcription speed of recordings feel at least 10x faster than Whisper, and the accuracy is better as well. Push to talk for dictation is pretty seamless since the model is so fast. I've observed no downside to Parakeet if you're speaking English.
polo
+1 for MacWhisper. Very full featured, nice that it's a one time purchase, and the developer is constantly improving it.
mark212
seems like "not yet" is the answer from other comments
braden-w
For those checking out the repo this morning, I'm in the middle of a release that adds Whisper C++ support!
https://github.com/epicenter-so/epicenter/pull/655
After this pushes, we'll have far more extensive local transcription support. Just fixing a few more small things :)
chrisweekly
> "I think there should be an open-source, local-first version of every app, and I would like them all to work together. The idea of Epicenter is to store your data in a folder of plaintext and SQLite, and build a suite of interoperable, local-first tools on top of this shared memory. Everything is totally transparent, so you can trust it."
Yes! This. I have almost no experience w/ tts, but if/when I explore the space, I'll start w/ Whispering -- because of Epicenter. Starred the repo, and will give some thought to other apps that might make sense to contribute there. Bravo, thanks for publishing these and sharing, and congrats on getting into YC! :)
sebastiennight
I think we're talking about STT (speech-to-text) here, not TTS.
spullara
IF you do want to then ALSO have a cloud version, you can just use the AgentDB API and upload them there and just change where the SQL runs.
Tmpod
I've been interested in dictation for a while, but I don't want to be sending any audio to a remote API, it all has to be local. Having tried just a couple of models (namely the one used by the FUTO Keyboard), I'm kinda feeling like we're not quite there yet.
My biggest gripe perhaps is not being able to get decent content out of a thought stream; the models can't properly filter out the pauses, "uuuuhmms", and much less so handle on the fly corrections to what I've been saying, like going back and repeating something with a slight variation and whatnot.
This is a challenging problem I'd love to see being tackled well by open models I can run on my computer or phone. Are there new models more capable of this? Is it not just a model thing, and I missing a good app too?
In the meanwhile, I'll keep typing, even though it can be quite a bit less convenient to do; especially true for note taking on the go.
hephaes7us
Have you tried Whisper itself? It's open-weights.
One of the features of the project posted above is "transformations" that you can run on transcripts. They feed the text into an LLM to clean it up. If you're willing to pay for the tokens, I think you could not only remove filler-words, but could probably even get the semantically-aware editing (corrections) you're talking about.
hn1986
excellent tool and easy to get started.
on win11, i installed ffmpeg using winget but it's not detecting it. running ffmpeg -version works but the app doesn't detect it.
one thing, how can we reduce the number of notifications received?
i like the system prompt option too.
glial
This is wonderful, thank you for sharing!
Do you have any sense of whether this type of model would work with children's speech? There are plenty of educational applications that would value a privacy-first locally deployed model. But, my understanding is that Whisper performs pretty poorly with younger speakers.
braden-w
Thank you! And you’re right, I think Whisper struggles with younger voices. Haven’t tested Parakeet or other models for this yet, but that’s a great use case (especially since privacy matters in education). I would also shoutout Hypernote! (https://hyprnote.com/) They might be expanding their model options, as they have shown with OWhisper (https://docs.hyprnote.com/owhisper/what-is-this).
marcodiego
> I’m basically obsessed with local-first open-source software.
We all should be.
0xbadcafebee
Not a fan of high resource use or reliance on proprietary vendors/services. DeepSpeech/Vosk were pre-AI and still worked well on local devices, but they were a huge pain to set up and use. Anyone have better versions of those? Looks like one successor was Coqui STT, which then evolved into Coqui TTS which seems still maintained. Kaldi seems older but also still maintained.
edit: nvm, this overview explains the different options: https://www.gladia.io/blog/best-open-source-speech-to-text-m... and https://www.gladia.io/blog/thinking-of-using-open-source-whi...
Aachen
Wait, I'm confused. The text here says all data remains on device and emphasises how much you can trust that, that you're obsessed with local-first software, etc. Clicking on the demo video, step one is... configuring access tokens for external services? Are the services shown at 0:21 (Groq, OpenAI, Antrophic, Google, ElevenLabs) doing the actual transcription, listening to everything I say, and is only the resulting text that they give us subject to "it all stays on your device"? Because that's not at all what I expected after reading this description
braden-w
Great catch Aachen, I should have clarified this better. The app supports both external APIs (Groq, OpenAI, etc.), and more recently local transcription (via whisper.cpp, OWhisper, Speaches, etc.), which never leaves your device.
Like Leftium said, the local-first Whisper C++ implementation just posted a few hours ago.
Leftium
The local transcription feature via whisper.cpp was just released 2 hours ago: https://github.com/epicenter-so/epicenter/releases/tag/v7.3....
IanCal
> All your data is stored locally on your device, and your audio goes directly from your machine to your chosen cloud provider (Groq, OpenAI, ElevenLabs, etc.) or local provider (Speaches, owhisper, etc.)
Their point is they aren’t a middleman with this, and you can use your preferred supplier or run something locally.
bangaladore
The issue is
> All your data is stored locally on your device,
is fundamentally incapable with half of the following sentence.
I'd write it as
> All your data is stored locally on your device, unless you explicitly decide to use a cloud provider for dictation.
braden-w
Great correction, wish I could edit the post! Updated the README to reflect this.
null
null
hephaes7us
Thanks for sharing! Transcription suddenly became useful to me when LLMs started being able to generate somewhat useful code from natural language. (I don't think anybody wants to dictate code.) Now my workflow is similar to yours.
I have mixed feelings about OS-integration. I'm currently working on a project to use a foot-pedal for push-to-transcribe - it speaks USB-HID so it works anywhere without software, and it doesn't clobber my clipboard. That said, an app like yours really opens up some cool possibilities! For example, in a keyboard-emulation strategy like mine, I can't easily adjust the text prompt/hint for the transcription model.
With an application running on the host though, you can inject relevant context/prompts/hints (either for transcription, or during your post-transformations). These might be provided intentionally by the user, or, if they really trust your app, this context could even be scraped from what's currently on-screen (or which files are currently being worked on).
Another thing I've thought about doing is using a separate keybind (or button/pedal) that appends the transcription directly to a running notes file. I often want to make a note to reference later, but which I don't need immediately. It's a little extra friction to have to actually have my notes file open in a window somewhere.
Will keep an eye on epicenter, appreciate the ethos.
NDxTreme
If you want a rabbit hole to go down, looking into cursorless, talonvoice and that whole sphere.
Actually dictating code, but they do it in a rather smart way.
jnmandal
Looks like a really cool project. Do you have any opinions on which transcription models are the best, from a quality perspective? I have heard a lot of mixed opinions on this. Curious what you've found in your development process?
Hey HN! Braden here, creator of Whispering, an open-source speech-to-text app.
I really like dictation. For years, I relied on transcription tools that were almost good, but they were all closed-source. Even a lot of them that claimed to be “local” or “on-device” were still black boxes that left me wondering where my audio really went.
So I built Whispering. It’s open-source, local-first, and most importantly, transparent with your data. All your data is stored locally on your device. For me, the features were good enough that I left my paid tools behind (I used Superwhisper and Wispr Flow before).
Productivity apps should be open-source and transparent with your data, but they also need to match the UX of paid, closed-software alternatives. I hope Whispering is near that point. I use it for several hours a day, from coding to thinking out loud while carrying pizza boxes back from the office.
Here’s an overview: https://www.youtube.com/watch?v=1jYgBMrfVZs, and here’s how I personally am using it with Claude Code these days: https://www.youtube.com/watch?v=tpix588SeiQ.
There are plenty of transcription apps out there, but I hope Whispering adds some extra competition from the OSS ecosystem (one of my other OSS favorites is Handy https://github.com/cjpais/Handy). Whispering has a few tricks up its sleeve, like a voice-activated mode for hands-free operation (no button holding), and customizable AI transformations with any prompt/model.
Whispering used to be in my personal GH repo, but I recently moved it as part of a larger project called Epicenter (https://github.com/epicenter-so/epicenter), which I should explain a bit...
I’m basically obsessed with local-first open-source software. I think there should be an open-source, local-first version of every app, and I would like them all to work together. The idea of Epicenter is to store your data in a folder of plaintext and SQLite, and build a suite of interoperable, local-first tools on top of this shared memory. Everything is totally transparent, so you can trust it.
Whispering is the first app in this effort. It’s not there yet regarding memory, but it’s getting there. I’ll probably write more about the bigger picture soon, but mainly I just want to make software and let it speak for itself (no pun intended in this case!), so this is my Show HN for now.
I just finished college and was about to move back with my parents and work on this instead of getting a job…and then I somehow got into YC. So my current plan is to cover my living expenses and use the YC funding to support maintainers, our dependencies, and people working on their own open-source local-first projects. More on that soon.
Would love your feedback, ideas, and roasts. If you would like to support the project, star it on GitHub here (https://github.com/epicenter-so/epicenter) and join the Discord here (https://go.epicenter.so/discord). Everything’s MIT licensed, so fork it, break it, ship your own version, copy whatever you want!