Kokoro TTS – A lightweight (82M params) text-to-speech model
18 comments
·January 17, 2025noisy_boy
mvdtnz
What's the point of promoting a model as "light weight" or even mentioning the parameter count if I can't run it locally? I don't give a toss how much pressure your remote hardware is under, and promoting a cloud service as small and lightweight only makes me think it's going to be cheap and crappy.
ipsum2
This looks like a fake website. The creator of the website is claiming credit for the model, which does not appear to be created by him. The original model can be found here, along with the source code: https://huggingface.co/hexgrad/Kokoro-82M
Every popular machine learning paper has a fake website associated with it, for some reason. Can anyone figure out why? Another example, someone created this website https://imagen3.org, which is NOT Imagen3 by Google. However, it currently ranks #2 for the model name.
padolsey
This seems to be a general pattern emerging. Cynical opportunists are wrapping hf endpoints/embeds in dodgy SaaS offerings. A similar one is BetterDictation, which tbf I do use. But I still hate that people are profiting off open-spirited ML engineers and HF's goodwill.
Notice in this case that each testimonial avatar links to an image asset with a different name than the purported persons' name. Notice additionally the user in the thread who's pushing this 'product'; their post history makes it obvious they're an LLM slopBot...
dudus
You can buy SaaS kits that include a frontend with pricing pages, backend and all code necessary to wrap any API and resell at a profit.
atoav
Why? Some people are so convinced they won't make it if they follow the rules and ethical principles, they try to do with out them.
qwertox
> You can find a hosted demo at hf.co/spaces/hexgrad/Kokoro-TTS.
And in the FAQ:
> What's included in the Kokoro TTS free trial?
> New users can try Kokoro TTS's full capabilities with our free trial. This allows you to experience our professional-grade text to speech technology firsthand, including access to all voices and both American and British English options.
So this is the "free trial"? Plus it being a cloud-based service makes me not understand the situation.
makeitdouble
Company is based in Singapore apparently
On the privacy policy part
> We collect certain personal data, including but not limited to your name, email address, and payment information (if applicable) to enhance the Service and improve user experience.
It's the first time I've seen collecting payment info to improve user experience.
nenaoki
https://kokorotts.org/ is the proper site.
nicman23
i just used it with https://github.com/santinic/audiblez/pull/14/files (including the pr because it has gpu accel)
it is very fast and very passable.
jallenjia
I'm excited to share Kokoro TTS, an open-source text-to-speech model we've been working on. Despite its relatively small size (82M parameters), it achieves impressive results in natural speech synthesis, ranking first in the TTS Spaces Arena benchmark.
The model is Apache 2.0 licensed and trained on less than 100 hours of audio data. It supports both American and British English, offering multiple voice options with natural emotional expression and 24kHz audio output.
We've deployed a demo at kokorotts.online where you can try it out. I'd really appreciate any feedback from the HN community on both the model's performance and potential applications.
Tech stack: StyleTTS 2 architecture, ONNX runtime, Next.js for the web interface.
kissgyorgy
It's NOT Open Source.
dontdoxxme
Confusing messaging, a previous version is: https://huggingface.co/hexgrad/Kokoro-82M (matching the demo if you use the "TTS v0.19" tab, it has some artefacts in the voice[1] and definitely doesn't sound as good as the latest version).
"There currently isn't a release date scheduled for the other voices"
[1]: https://huggingface.co/blog/hexgrad/kokoro-short-burst-upgra...
vanous
And it's not offline.
CGamesPlay
In which sense? https://huggingface.co/hexgrad/Kokoro-82M
- Apache 2.0 weights in this repository
- MIT inference code in spaces/hexgrad/Kokoro-TTS adapted from yl4579/StyleTTS2
- GPLv3 dependency in espeak-ng
kissgyorgy
That's not the model repository advertised in the post.
dcreater
The website is not from the authors. Seems fraudulent
matrixhelix
Where is the code?
From the faq:
> Can I use Kokoro TTS offline?
> Kokoro TTS is a cloud-based service that requires an internet connection to access our advanced text to speech technology. This ensures you always have access to the latest improvements and don't need to worry about local hardware requirements or model installations.
I would happily take on the worrying for offline instead of them having to worry about my worries.