Chaplin: Local visual speech recognition (VSR) in real-time

emsign

I’m sorry, Dave. I’m afraid I can’t do that.

touwer

I know I’ve made some very poor decisions recently, but I can give you my complete assurance that my work will be back to normal

TrueDuality

Very cool! This definitely has the potential to make eavesdropping on strangers significantly more accessible. I'm a tad worried about this kind of proliferation but this kind of thing is probably inevitable and getting something out there at least lets people to start discussing it openly with a concrete reference.

I really like interacting with models via voice while typing through other apps but its infeasible in loud environments, and feels bad and kind of invasive in places like airports. Looking forward to trying this out more aggressively.

skeledrew

I really wanted to give this a try. A whole lot of GBs for the setup, but I guess that's today's standard when it comes to anything AI-related. And then it also wanted root to run. I draw the line there. Will try again if/when I get around to replace that package that requires root with something that doesn't, or someone else does.

threecheese

This is great. IMO, given the advancement in voice assistance tech we will desperately want some kind of non-disruptive interface that can be used in public. I thought maybe it would come from subvocalization, but lip reading tech would work in a pinch.

Maybe I am just more full of anxiety than most, but I often wish I could take notes without thumbs, even when I’m at home (but most definitely when in public).

flax

I hope that this is eventually applicable to wearable systems. A small face-observing camera under the brim of a hat or something.

Voice is a terrible interface, but a decent input method for natural language inputs. Removing/reducing the privacy and intrusiveness implications would go a long way towards making it an acceptable thing to use when not alone.

nmstoker

Very impressive but I don't follow how the author can license it under MIT terms when they've trained it using material only licensed for research purposes.

Wouldn't research purposes be non-commercial and therefore be incompatible with MIT terms?

Eisenstein

The script is MIT licensed. The model weights might not be.

reaperman

In terms of copyright law, it's not clear that the model weights are derivative works from the training material. At this time, I think courts would lean towards the weights not being a derivative work. And even if it is a derivative work, it could be considered transformative use and therefore not a violation of copyright. Therefore, IMHO the creator of the model can license the model weights however they want to, regardless of the material that was used to train it.

For now, anyways.

If model weights are ruled derivative works of copyrighted material, it would wipe trillions of dollars off the S&P500 overnight. Currently, copyrighted works are needed to train all the big foundational models - and there's no practical or cost-effective way to get a license for all of it.

HN

Chaplin: Local visual speech recognition (VSR) in real-time

Chaplin: Local visual speech recognition (VSR) in real-time