Skip to content(if available)orjump to list(if available)

Kokoro TTS – A lightweight (82M params) text-to-speech model

noisy_boy

From the faq:

> Can I use Kokoro TTS offline?

> Kokoro TTS is a cloud-based service that requires an internet connection to access our advanced text to speech technology. This ensures you always have access to the latest improvements and don't need to worry about local hardware requirements or model installations.

I would happily take on the worrying for offline instead of them having to worry about my worries.

mvdtnz

What's the point of promoting a model as "light weight" or even mentioning the parameter count if I can't run it locally? I don't give a toss how much pressure your remote hardware is under, and promoting a cloud service as small and lightweight only makes me think it's going to be cheap and crappy.

ipsum2

This looks like a fake website. The creator of the website is claiming credit for the model, which does not appear to be created by him. The original model can be found here, along with the source code: https://huggingface.co/hexgrad/Kokoro-82M

Every popular machine learning paper has a fake website associated with it, for some reason. Can anyone figure out why? Another example, someone created this website https://imagen3.org, which is NOT Imagen3 by Google. However, it currently ranks #2 for the model name.

padolsey

This seems to be a general pattern emerging. Cynical opportunists are wrapping hf endpoints/embeds in dodgy SaaS offerings. A similar one is BetterDictation, which tbf I do use. But I still hate that people are profiting off open-spirited ML engineers and HF's goodwill.

Notice in this case that each testimonial avatar links to an image asset with a different name than the purported persons' name. Notice additionally the user in the thread who's pushing this 'product'; their post history makes it obvious they're an LLM slopBot...

dudus

You can buy SaaS kits that include a frontend with pricing pages, backend and all code necessary to wrap any API and resell at a profit.

atoav

Why? Some people are so convinced they won't make it if they follow the rules and ethical principles, they try to do with out them.

qwertox

> You can find a hosted demo at hf.co/spaces/hexgrad/Kokoro-TTS.

And in the FAQ:

> What's included in the Kokoro TTS free trial?

> New users can try Kokoro TTS's full capabilities with our free trial. This allows you to experience our professional-grade text to speech technology firsthand, including access to all voices and both American and British English options.

So this is the "free trial"? Plus it being a cloud-based service makes me not understand the situation.

makeitdouble

Company is based in Singapore apparently

On the privacy policy part

> We collect certain personal data, including but not limited to your name, email address, and payment information (if applicable) to enhance the Service and improve user experience.

It's the first time I've seen collecting payment info to improve user experience.

nenaoki

https://kokorotts.org/ is the proper site.

nicman23

i just used it with https://github.com/santinic/audiblez/pull/14/files (including the pr because it has gpu accel)

it is very fast and very passable.

jallenjia

I'm excited to share Kokoro TTS, an open-source text-to-speech model we've been working on. Despite its relatively small size (82M parameters), it achieves impressive results in natural speech synthesis, ranking first in the TTS Spaces Arena benchmark.

The model is Apache 2.0 licensed and trained on less than 100 hours of audio data. It supports both American and British English, offering multiple voice options with natural emotional expression and 24kHz audio output.

We've deployed a demo at kokorotts.online where you can try it out. I'd really appreciate any feedback from the HN community on both the model's performance and potential applications.

Tech stack: StyleTTS 2 architecture, ONNX runtime, Next.js for the web interface.

kissgyorgy

It's NOT Open Source.

dontdoxxme

Confusing messaging, a previous version is: https://huggingface.co/hexgrad/Kokoro-82M (matching the demo if you use the "TTS v0.19" tab, it has some artefacts in the voice[1] and definitely doesn't sound as good as the latest version).

"There currently isn't a release date scheduled for the other voices"

[1]: https://huggingface.co/blog/hexgrad/kokoro-short-burst-upgra...

vanous

And it's not offline.

CGamesPlay

In which sense? https://huggingface.co/hexgrad/Kokoro-82M

- Apache 2.0 weights in this repository

- MIT inference code in spaces/hexgrad/Kokoro-TTS adapted from yl4579/StyleTTS2

- GPLv3 dependency in espeak-ng

kissgyorgy

That's not the model repository advertised in the post.

dcreater

The website is not from the authors. Seems fraudulent

HF: https://huggingface.co/spaces/hexgrad/Kokoro-TTS

matrixhelix

Where is the code?