Kitten TTS: 25MB CPU-Only, Open-Source Voice Model

colonCapitalDee

Very cool model, but the post is a caricature of AI writing. "Okay, let's get into the nitty-gritty. What makes this little beast tick? These aren't just bullet points on a GitHub README; these are the specs that will fundamentally redefine what you thought was possible with local AI." Sure.

esseph

Everybody always thinks everything is AI. AI learned from consuming writing.

This is a ouroboros that will continue.

(Not saying this is or isn't, simply that these claims are rampant on a huge number of posts and seem to be growing.)

treyd

This is strictly true but not correct. LLMs were trained on human-written text, but they were post-trained to generate text in a particular style. And that style does have some common patterns.

dismalaf

The writing style we associate with AI is the 2010's blogging style that AI learned from... So it definitely could have been written by a person.

blopker

Web version: https://clowerweb.github.io/kitten-tts-web-demo/

It sounds ok, but impressive for the size.

kenarsa

Try https://github.com/Picovoice/orca It's about 7MB all included

satvikpendem

Does an apk for Android exist for replacing its speech to text engine? I tried sherpa-onnx but it was too slow for real time usage it seemed, and especially so for audiobooks when sped up.

quantummagic

Doesn't work here. Backend module returns 404 :

https://clowerweb.github.io/node_modules/onnxruntime-web/dis...

Retr0id

Looks like this commit 15 minutes ago broke it https://github.com/clowerweb/kitten-tts-web-demo/commit/6b5c...

nine_k

Does anybody find it funny that sci-fi movies have to heavily distort "robot voices" to make them sound "convincingly robotic"? A robotic, explicitly non-natural voice would be perfectly acceptable, and even desirable, in many situations. I don't expect a smart toaster to talk like a BBC host; it'd be enough is the speech if easy to recognize.

roywiggins

This one is at least an interesting idea: https://genderlessvoice.com/

wkat4242

Hmm the quality is not so impressive. I'm looking for a really naturally sounding model. Not very happy with piper/kokoro, XTTS was a bit complex to set up.

For STT whisper is really amazing. But I miss a good TTS. And I don't mind throwing GPU power at it. But anyway. this isn't it either, this sounds worse than kokoro.

kenarsa

Try https://github.com/Picovoice/orca

andai

Can you run it in reverse for speech recognition?

nine_k

I hope this is the future. Offline, small ML models, running inference on ubiquitous, inexpensive hardware. Models that are easy to integrate into other things, into devices and apps, and even to drive from other models maybe.

RobKohr

What's a good one in reverse; speech to text?

jasonjmcghee

Whisper and the many variants. Here's a good implementation.

https://github.com/ggml-org/whisper.cpp

wewewedxfgdf

say is only 193K on MacOS

  ls -lah /usr/bin/say
  -rwxr-xr-x  1 root  wheel   193K 15 Nov  2024 /usr/bin/say

Usage:

  M1-Mac-mini ~ % say "hello world this is the kitten TTS model speaking"

dented42

That’s not a far comparison. Say just calls the speech synthesis APIs that have been around since at least Mac OS 8.

That being said, the ‘classical’ (pre-AI) speech synthesisers are much smaller than kitten, so you’re not wrong per se, just for the wrong reason.

satvikpendem

`say` sounds terrible compared to modern neural network based text to speech engines.

wnoise

And what dynamic libraries s it linked to? And what other data are they pulling in?

mlboss

Reddit post with generated audio sample: https://www.reddit.com/r/LocalLLaMA/comments/1mhyzp7/kitten_...

pkaye

Where does the training data come for the models? Is there an openly available dataset the people use?

toisanji

Wow, amazing and good work, I hope to see more amazing models running on CPUs!

onair4you

Okay, lots of details information and example code, great. But skimming through I didn’t see any audio samples to judge the quality?

TheAceOfHearts

They posted a demo on reddit[0]. It sounds amazing given the tiny size.

[0] https://old.reddit.com/r/LocalLLaMA/comments/1mhyzp7/kitten_...

GaggiX

https://huggingface.co/KittenML/kitten-tts-nano-0.1

https://github.com/KittenML/KittenTTS

This is the model and Github page, this blog post looks very much AI generated.

HN

Kitten TTS: 25MB CPU-Only, Open-Source Voice Model

Kitten TTS: 25MB CPU-Only, Open-Source Voice Model