Show HN: Mikey – No bot meeting notetaker for Windows

72 comments

·February 12, 2025

Visit

dmantis

Looks cool. Is it possible to use a local model (like whisper) to avoid leaking conversations to the cloud-based AI?

hotrod46

That’s what’s planned next :)

hotrod46

hi, ive added this, lmk what you think

peterhorvath01

[dead]

mijoharas

I was looking into something like this for linux recently. Didn't find anything obviously simple

(considered hooking up whisper.cpp and a bit of audio magic to make it at least transcribe, but it firstly seemed like a fair bit of a pain and secondly I couldn't think of a nice way to do speaker detection.)

utrack

https://github.com/m-bain/whisperX looks promising - I'm hacking away on an always-on transcriber for my notes for later search&recall. It has support for diarization (the speaker detection you're looking for).

I'm currently hacking away on a mix of https://github.com/speaches-ai/speaches + https://github.com/ufal/whisper_streaming though - mostly because my laptop doesn't have a decent GPU, I stream the audio to a home server instead.

But overall it's pretty simple to do after you wrangle the Python dependencies - all you need is a sink for the text files (for example, create a new file for every Teams meeting, but that's another story...)

mijoharas

Any good solutions for capturing the audio streams and piping them where they're needed? (I.e both microphone and speakers. I was wondering if I needed to mess with pulseaudio and/or jack (I mean pipewire under the hood, but I think those APIs sit on top and might be clearer))

mijoharas

Never mind, played around a little, and pulseaudio's cli API makes it easy enough to sling some loopback/virtual devices around that you can then read from easily enough.

ewuhic

So which are you "hacking away on" in the end?

m348e912

I don't think this tool can do what native AI transcription integrations can do, track who is speaking. Is there any novel way of addressing that gap?

mpdaugherty

We did a lot of work at https://www.quillmeetings.com to build a diarization & speaker recognition pipeline that works locally on mac and windows. Basically, we can create embeddings of parts of the audio, like you might create embeddings for text for a RAG system, and cluster them (simplifying a lot of details from the "last 80%" that has taken a lot of effort to get working...)

The speaker recognition can't be as perfect as listening to each stream separately like Zoom itself can do, but it also learns your contacts over time and can recognize voices for ad-hoc in-person meetings, etc. which I've found really magical since we launched it.

jtswole

Ah yes, a locally-run, mostly-accurate speaker recognition pipeline that isn't open source. Love to see cool features locked away while the rest of us plebs make do with whatever scraps the OSS world has managed to build. But hey, at least it kind of works, so you can enjoy your slightly-wrong diarization in private.

Truly the future of meetings.

prollyjethi

not open source :/

lukeluc

I'm not sure if you have any interest in porting this to Mac, but in case you do, here's some native Swift code that might help. It was built by me and a friend originally for Electron, but the repo should act as a general template. It's completely open source, and if you (or anyone) need any license modifications for any reason, just reach out: https://github.com/O4FDev/electron-system-audio-recorder/blo...

hotrod46

thats cool, ill look into it

bbor

What does “no bot” mean? I don’t see any elaboration, tho maybe I’m just blind!

simplemindedbot

There’s not a “bot” that needs to attend the meeting and show up in the list of attendees thus giving away the recording of the call. Otter.ai, for instance, shows up as “Otter” (or another name) on a Zoom call when it is recording and taking notes.

Cheer2171

Oh, so it is for more "seamlessly" helping people commit the crime of wiretapping in two-party consent jurisdictions, like California?

If you don't like people knowing you are recording them, you probably have a consent issue.

stevenAthompson

You could have said this exact same thing without it sounding like a personal attack, but you chose to be unkind instead. I wonder why?

zamadatix

Whether it is actually a crime for a person in a one-party consent jurisdiction recording a call with a person in a two-party consent jurisdiction is not a consistently settled issue. At least in US courts, dunno about elsewhere.

Sometimes the courts have sided "the stricter jurisdiction's law applies" while other times the courts have sided "the law where the recording was made applies". The federal law is not any clearer, stating one party consent is the default and states can override but offering no further guidance. I suspect this will someday be addressed in the Supreme Court.

zephyreon

I always ask if I can transcribe using an AI tool regardless of jurisdiction. Not sure what the other commenter’s intentions were but just throwing my two cents in.

I prefer non-bot transcription tools solely because they’re not a nuisance during the meeting — they take up valuable screen real estate and provide no input during the meeting so I’d rather them be invisible.

adewinter

Should your concern lie with individuals transcribing their own conversations, or with mass surveillance and wiretapping actively being executed by a broad range of official and corporate entities without your consent?

maccard

Not affiliated, but I'd guess it doesn't have a "bot" account join the zoom/meets call

hotrod46

The other meeting note takers usually have a bot join the meet to take notes, that seemed a bit strange to me.

someonehere

I’m using Granola for macOS and it’s limited to that platform. Hoping this is a good windows alternative.

Wondering if anyone out there has an OSS macOS client similar to this one so I can ditch payware.

alkonaut

Something I find annoying with automatic transcriptions and summaries, like the one built into Teams, is that they lack the context necessary to properly interpret what's being said. Example if I have a meeting discussing products, abbreviations or systems with "internal" names then it can't discern them or statistically rejects them, replacing them with its best guess for a dictionary word instead. So say we have a long call involving frequent mentions about a measure called pNet pronounced in the meeting "Peenet". Then you end up with a transcription of a bunch of guys having a discussion about penises. Hilarious, the first few times. OK always hilarious, but not so useful.

Being able to set the system prompt for these transcriptions would be very useful. Like "You are a friendly bot transcribing meetings at a software company. Some common terms and abbreviations you'll encounter are...".

_joel

My favourite was Kubernetes in our meeting being referred to as Cuban Eighties. ⎈

thih9

Anecdotally, if you have an accent and want to reference Maltese Falcon[1], your voice recognition software may understand it as “Maltese f* off”.

[1]: https://en.m.wikipedia.org/wiki/The_Maltese_Falcon_(1941_fil...

sys_64738

Perhaps these will be flagged for the CIA or DEA to investigate due to illegal importation of Cubans from the enemy!

jvanderbot

This should be trivially solveable with a glossary as context, as you suggest. I bet the above repo would love a PR, too!

sesm

But the error happens in 'audio to text' part, so text prompt won't solve it. The way to fix it is probably fine-tuning the underlying audio to text model.

alkonaut

Doing audio-to-text requires having a statistical model for what word or phrase a piece of sound is most likely to be. Without context, you can't do better than ranking the most likely candidates where a common word is more likely than an uncommon one. Having a task-specific dictionary at that point would help.

One could also imagine doing it at the summary step where the AI could simply be asked to do phonetic analysis. "Here is a transcription of a meeting. Here is a list of terms/names/participants etc. Given the transcription, the meeting context/topics and assuming the transcriptor has made errors, replace similarly sounding words and terms with more likely ones from the context"

ukuina

Whisper accepts a system prompt.

collinmcnulty

Gong has such a feature. It’ll even expand out acronyms the first time they show up in the transcript.

null

[deleted]

sirjaz

Looks awesome, love that it is a local native app

ForHackernews

>transcribing it using the Groq API

It's not really local: it sends all the audio to some cloud AI API.

troyvit

I'm not familiar with Groq, but it looks like:

https://sdk.vercel.ai/providers/ai-sdk-providers/groq

Some open models support it. It seems in theory that you could use your own cloud AI then right?

hotrod46

ive fixed it now, it now runs whisper locally to transcribe

hotrod46

thats true, plan is to update to transcribe locally next

null

[deleted]

rs186

Microsoft Teams already provides similar built-in features, along with translation, and I have to say it is one of the rare AI tools from Microsoft that makes sense and actually works -- I had good experience using it for reviewing meetings in non English language. It's not hard to imagine that this will be a standard feature of all mainstream video conference software. Wonder what is the place for these tools.

darknavi

I've thoroughly enjoyed not having to anoint a "note taker" in my meetings in the last few months.

null

[deleted]

oersted

There's still a surprising lack of good video call recording services that can be controlled programmatically, unlike the end-to-end SaaS apps like Read.ai or Otter.ai.

The only open-source one I could find is Amurex, which looks promising. But it only supports Google Meet for now, it does it a bit differently with a Chrome extension, and it is generally rather immature, but I do wish them the best.

The only API services available are Recall.ai and MeetingBaaS, they both support the big three (Google Meet, Microsoft Teams and Zoom), but they are rather expensive at $0.5 - $1 per hour. The Calendar Syncing feature is also locked behind enterprise tiers with additional monthly fees in the hundreds, and it is rather important real-world use.

jtswole

Hey there

The creator of Amurex here. Thank you for the kind words :D More platform support is coming very soon ;) (read next week)

> The only API services available are Recall.ai and MeetingBaaS, they both support the big three (Google Meet, Microsoft Teams and Zoom), but they are rather expensive at $0.5 - $1 per hour.

seems like someone has told you our internal roadmap xD but I am glad to see we are on the right track to solve the problem :D

oersted

You are doing great work, and I do think making it open-source is a smart strategic choice. There's still so much potential for building AI intelligence products on top of video call recordings, and right now you are offering the only practical foundation to build such systems on.

I've been keeping a close eye because $1/h is unsustainable for what we are building, and there's no good reason why it should cost so much. It's manageable for early traction, but soon we'll need to consider either to build all those integrations ourselves or to build on top of Amurex. We might be contributing soon.

I did see in GitHub that Teams support was almost done, exciting! Do you plan to continue with the browser extension model, or are you also looking for solutions to record meetings that happen in the Teams/Zoom native client?

I think this is why most companies do it by creating a bot that joins the meeting, it's also great free advertising for them. Of course it's a bit awkward for the user, but it's becoming a normal thing, and ethically it's better to be explicit about the fact you are recording.

jtswole

> You are doing great work, and I do think making it open-source is a smart strategic choice. There's still so much potential for building AI intelligence products on top of video call recordings, and right now you are offering the only practical foundation to build such systems on.

Thank you :D

> I've been keeping a close eye because $1/h is unsustainable for what we are building, and there's no good reason why it should cost so much. It's manageable for early traction, but soon we'll need to consider either to build all those integrations ourselves or to build on top of Amurex. We might be contributing soon.

Sounds great! We are super happy to support all the integrations. If you can message me on discord, I'd be super keen to hear what you have to say.

> I did see in GitHub that Teams support was almost done, exciting! Do you plan to continue with the browser extension model, or are you also looking for solutions to record meetings that happen in the Teams/Zoom native client?

Coming soon ;)

> I think this is why most companies do it by creating a bot that joins the meeting, it's also great free advertising for them. Of course it's a bit awkward for the user, but it's becoming a normal thing, and ethically it's better to be explicit about the fact you are recording.

The problem with bots is that

- first, they are annoying. - second, I have a tendency to reject all the bots joining my meeting because they are annoying, which deems the bot products practically useless.

And you raise a good point about ethics, we expect the users to be grown up about their decisions. The users are expected to act according to their state laws.

I grew up reading and being influenced by liberation in FOSS software. I don't really want to impose our own "laws" on a user if their state says otherwise.

Erazal

Creator of MeetingBaaS here (sorry for the double ping), you actually start lower than $1 / hour, rather $0.69 / hour and that scales down quite fast

Unfortunately cloud infrastructure has a cost :/

nduncan_hmc

Hey there, I'm building an open source Recall at https://github.com/noah-duncan/attendee, designed for convenient self-hosting. It's fairly immature but other engineers are starting to contribute and things are picking up. Pretty sure it's the only open source example of a google meet bot that can extract audio, video, transcript and speak in the meeting.

Erazal

Hey :)

Creator of MeetingBaas here

We're actually thinking of open-sourcing our bots too!

null

[deleted]

ttul

Has anyone done this on the Mac? I hate sending audio to Otter; it creeps me out.

someonehere

Granola. Best meeting app I’ve used. I have a notepad that takes markup I can add myself and it intelligently fills in the notes I wrote.

eg. I put bullet points with something like “updates from Steve?” And do that for everyone during our check in. When the meeting ends it takes all their conversation in the transcript and fills in my markup with the notes.

I’ve attended meetings where I had zero participation and focus on doing something else during the meeting. When it’s over it gives me a detailed summary of the meeting. It felt like I had an assistant taking detailed, ordered notes for me. It’s almost like that scene from the movie Old School. Rodney Dangerfield sent his secretary to stenograph the lecture time so he didn’t have to attend and she gets called out by the professor. Felt just like that kind of transcribing.

simplemindedbot

Spellar.ai does a great job. There’s others out there for Mac but I like Spellar’s calendar integration.

Interestingly, their initial raison d’être was to help with English pronunciation and speaking speed, giving you real time feedback. They’ve downplayed this in recent releases, but the functionality is still there. Though, I’m a native English speaker and it always flagged me as pronouncing words incorrectly even though I’ve got little regional accent (I’ve been told this by others, not just my opinion. I had a speech therapist as a mother, hence little accent)

mpdaugherty

We do this at quillmeetings.com - the audio stays on your device and is transcribed by whisper. We also do speaker splitting and recognition with a combination of models. If you share or sync notes/meetings they are e2e encrypted.

FYI, the transcript-only product is free forever (it's local, so why not?), but generating AI notes, interpreting screenshots if you enable that, etc. are in the Pro plan and do require using a cloud API.

doug_life

https://speechpulse.com does fully local audio transcription. The UI and settings are not the most intuitive, but it works fairly well and they are making constant updates.

simplemindedbot

As an additional note, Spellar does let you bring your own Open AI key but does not allow for purely local processing. You’ve still got to send the audio out for transcription and interpretation.

Also, I have no affiliation with Spellar, just a user.

null

[deleted]