Kitten TTS: 25MB CPU-Only, Open-Source Voice Model
32 comments
·August 6, 2025colonCapitalDee
esseph
Everybody always thinks everything is AI. AI learned from consuming writing.
This is a ouroboros that will continue.
(Not saying this is or isn't, simply that these claims are rampant on a huge number of posts and seem to be growing.)
treyd
This is strictly true but not correct. LLMs were trained on human-written text, but they were post-trained to generate text in a particular style. And that style does have some common patterns.
dismalaf
The writing style we associate with AI is the 2010's blogging style that AI learned from... So it definitely could have been written by a person.
blopker
Web version: https://clowerweb.github.io/kitten-tts-web-demo/
It sounds ok, but impressive for the size.
kenarsa
Try https://github.com/Picovoice/orca It's about 7MB all included
satvikpendem
Does an apk for Android exist for replacing its speech to text engine? I tried sherpa-onnx but it was too slow for real time usage it seemed, and especially so for audiobooks when sped up.
quantummagic
Doesn't work here. Backend module returns 404 :
https://clowerweb.github.io/node_modules/onnxruntime-web/dis...
Retr0id
Looks like this commit 15 minutes ago broke it https://github.com/clowerweb/kitten-tts-web-demo/commit/6b5c...
nine_k
Does anybody find it funny that sci-fi movies have to heavily distort "robot voices" to make them sound "convincingly robotic"? A robotic, explicitly non-natural voice would be perfectly acceptable, and even desirable, in many situations. I don't expect a smart toaster to talk like a BBC host; it'd be enough is the speech if easy to recognize.
roywiggins
This one is at least an interesting idea: https://genderlessvoice.com/
wkat4242
Hmm the quality is not so impressive. I'm looking for a really naturally sounding model. Not very happy with piper/kokoro, XTTS was a bit complex to set up.
For STT whisper is really amazing. But I miss a good TTS. And I don't mind throwing GPU power at it. But anyway. this isn't it either, this sounds worse than kokoro.
andai
Can you run it in reverse for speech recognition?
nine_k
I hope this is the future. Offline, small ML models, running inference on ubiquitous, inexpensive hardware. Models that are easy to integrate into other things, into devices and apps, and even to drive from other models maybe.
RobKohr
What's a good one in reverse; speech to text?
jasonjmcghee
Whisper and the many variants. Here's a good implementation.
wewewedxfgdf
say is only 193K on MacOS
ls -lah /usr/bin/say
-rwxr-xr-x 1 root wheel 193K 15 Nov 2024 /usr/bin/say
Usage: M1-Mac-mini ~ % say "hello world this is the kitten TTS model speaking"
dented42
That’s not a far comparison. Say just calls the speech synthesis APIs that have been around since at least Mac OS 8.
That being said, the ‘classical’ (pre-AI) speech synthesisers are much smaller than kitten, so you’re not wrong per se, just for the wrong reason.
satvikpendem
`say` sounds terrible compared to modern neural network based text to speech engines.
wnoise
And what dynamic libraries s it linked to? And what other data are they pulling in?
mlboss
Reddit post with generated audio sample: https://www.reddit.com/r/LocalLLaMA/comments/1mhyzp7/kitten_...
pkaye
Where does the training data come for the models? Is there an openly available dataset the people use?
toisanji
Wow, amazing and good work, I hope to see more amazing models running on CPUs!
onair4you
Okay, lots of details information and example code, great. But skimming through I didn’t see any audio samples to judge the quality?
TheAceOfHearts
They posted a demo on reddit[0]. It sounds amazing given the tiny size.
[0] https://old.reddit.com/r/LocalLLaMA/comments/1mhyzp7/kitten_...
GaggiX
https://huggingface.co/KittenML/kitten-tts-nano-0.1
https://github.com/KittenML/KittenTTS
This is the model and Github page, this blog post looks very much AI generated.
Very cool model, but the post is a caricature of AI writing. "Okay, let's get into the nitty-gritty. What makes this little beast tick? These aren't just bullet points on a GitHub README; these are the specs that will fundamentally redefine what you thought was possible with local AI." Sure.