ElevenReader by ElevenLabs
75 comments
·February 12, 2025csantini
laurentlb
Interesting! This uses the Kokoro-82M model, which has a pretty good quality, but the set of languages is still quite limited.
neom
How does it hold up on long stuff? I use Elevenlabs Studio daily and once things start to get into the chapters long, the voice can really start to go off the rails. It'd say they've solved a lot of this over the past 2/3 months, but it does still happen on long stuff.
csantini
It holds up well, because Audiblez uses sentence splitting (via Spacy models) before audio synthesis
masteruvpuppetz
>> the voice can really start to go off the rails. Do you mean the AI gets tired?
zaptrem
In autoregressive models error accumulates over time. He likely means the voice starts to make odd sounds/gets lower quality. It would be really interesting if OP could share a clip of this phenomenon!
simongray
Oh no, it doesn't run on Apple Silicon. That's too bad.
csantini
It works on Apple Silicon, but it doesn't use the GPU. Because Kokoro has not been implemented yet in MLX
_joel
> On my M2 MacBook Pro, on CPU, it takes about 1 hour, at a rate of about 60 characters per second.
Umm, it does.
tonyhart7
good, now how I can use this on mobile??
rapind
Oh wow. Thanks for posting! Samples sound great (on par with eleven by my untrained ear). Will definitely use this.
ultrasounder
Bravo!
emptysongglass
I would never trust the company that acquired Omnivore only to sunset it with 2 weeks notice to retrieve data.
Companies won't stop pulling this garbage unless we stop supporting them.
echelon
You can fight back by supporting and advocating for open source foundation text to speech models. XTTS, GptSoVits, Tortoise, Zonos, etc.
Open source models drive proprietary foundation models' margin to zero.
The only reason elevenlabs became a unicorn was their margin. If they became a commodity, they'd find themselves in a deep pit.
qnleigh
Sounds good. Do any of these have iOS or Android apps?
james-bcn
OMG I didn't realize that had happened. That sucks. Omnivore was great. But now I'm really glad I didn't make it part of my processes.
darkwater
I know I'm growing old but this is the kind of tech application that I don't like. Arts should be the last thing to be 100% fully done by a program. Enhancing capabilities in artists? Hell yeah. Replacing completely voice actors? No, thanks.
ramonverse
AI voice is literally the only way I have to "read" an obscure article during 1h non-static commutes.
darkwater
I understand, It can do things that weren't previously possible, but it will also replace things that were done by humans, by artists before. Overall, in my opinion, is still a loss.
nathanyukai
"replace things that were done by humans" isn't a loss by itself, if it frees up human labour to do other things. If human replaced by AI can't find better things to do, such that it makes them poorer, or anti-social its a loss but not necessarily AI's fault.
woadwarrior01
Hasn't this been around for ~4 months? Interesting to see this here, since their competitor Zyphra, just released two Apache 2.0 licensed open weights TTS models yesterday[1].
Kabukks
Last time I tried Elevenlabs for German text, it got a lot of numbers and dates wrong.
E. g. saying "1963" when the actual year in the text was 1967. Yeah, the voices sound very realistic. But I'm not sure how useful that is if you can't trust the spoken words.
Does anyone know if it got better in the last weeks?
hiAndrewQuinn
This is excellent. I just tested the Finnish voices on my simple news archive [1], and the pronunciation was quite good and clear.
It's unfortunate that I can't export audio clips locally; otherwise I would immediately look into using this for generating my Finnish flashcard decks from the same material [2]. I've thought about doing the same with the audio and video feeds included with this news broadcast, but getting Whisper to sync up properly with what's written down and cutting up the raw audio in that way still seems like more effort than I'm willing to invest right now.
gwd
> It's unfortunate that I can't export audio clips locally; otherwise I would immediately look into using this for generating my Finnish flashcard decks from the same material [2].
elevenlabs has an API which seemed quite reasonable when I looked into it. A bit of python should get you what you want pretty quickly.
benrutter
I've been looking for a good and convenient way to read papers that are published in PDF for a while.
Ideally, I'd be able to strip out the text content and send it to my kindle in readable form. Since apparently that's science fiction, this looks like a really good plan B! Will definitely give it a go.
elashri
You can jailbreak you kindle [1] and install KOReader[2] and this will allow you to do this science fiction.
janpmz
You can try https://www.pdftomp3.com/ as well.
barrell
Been using eleven labs for several years now. I was really impressed with their multilingual model a few years ago.
Since then, they’ve released a few cheaper models, but the quality suffers greatly (they still have the old models though so it’s not an issue). They’ve also been releasing a ton of different products around TTS.
I don’t mean this as a criticism — I just am curious why SOTA TTS has not improved from one model by one company several years ago, and why even said company isn’t able to improve on that model.
BoorishBears
The biggest challenge with TTS is high quality voice data. The architectures of closed providers still mostly trace their roots to stuff like Tortoise with a few exceptions.
Which is why it's especially ridiculous ElevenLabs allows professionals to upload their voices, charges users of those voices a minimum of $50 per million characters, likely pays under $1 for the compute... and then passes on a whopping $2 back to the professional.
I think the next disruptive TTS competitor is going to form out of just offering to pay better rates than ElevenLabs to their PVC users.
Finetuning established architectures on cleaner synthetic data is already getting open source models increasingly competitive, so getting top PVC samples from the source would likely put you right about where they are today.
limo11
Rev share is up to 20% on default rates (depending on notice period). With custom rates they can make their voice more expensive and earn up to $0.2 for every 1000 characters. So you can do the math.
BoorishBears
The math is you're paying a pittance considering the insane margins involved and the fact you're using their voices in a flywheel that's actively obsoleting them.
And considering how much your company owes to an open source model, it's impressive how little you've returned to the commons.
-
But no worries, the top comment under this post is an open source model that was finetuned for a couple of thousand dollars by a single dude soliciting the public for random voice samples.
If Google has no moat, you're out to sea.
juliendorra
You should try it with your own voice! (By first creating a custom voice on the web interface. The quick basic clone should be enough).
I found that it’s my preferred way to use their reader, as it makes the reading more neutral and transparent for my brain.
crakhamster01
The generative podcasts feature feels so dystopian. I didn't realize this SNL skit was based off of a real product lol
sky2224
The video shows scenarios of people listening to pdfs of pretty dense material (e.g., computer science, bio mechanics).
Does anyone here actually have positive results doing this? It seems to me listening to anything that's even remotely complex with the intent of learning it just isn't something that's feasible.
qnleigh
It depends a lot on the paper. I've been using a TTS app to read papers for years. Papers that are really equation dense, convey they key ideas in figures or get too detailed aren't listenable. But sometimes review articles or papers with one clear message hit that sweet spot and are very listenable. There's one topic where everything I know about it I learned by listening to a review article on a long run. It was actually quite pleasant!
neom
Severe dyslexia here, but ask me about any conversation or audio book or class I've listened to. Gimme anything audio and gimme it at 1.5x plz! I spend so much money gen'ing audio these days but it's soooo nice to be able to learn so quickly now.
woodson
I used to have papers read to me via TTS when I had a long commute. This was before the current crop of neural TTS, mind you, so the quality and naturalness wasn’t as good, but it was good enough to tolerate and to get the gist of a paper. It failed terribly on equations, of course, but that’s often not too important on the first reading.
nice__two
That’s my biggest gripe with audiobooks: good for fiction, not so good for learning.
yreg
For me they are actually best for non-fiction, but it has to be books. Papers are too information dense.
I get easily distracted and lose attention while listening to an audiobook. This is usually problematic with fiction, because suddenly I don't know who this new character is or what's happening. And rewinding to the precise position where I stopped paying attention is of course much more difficult than in written text.
I found that non-fiction books work great for me, because even if you ignore a page or two it makes no difference, the author keeps repeating their point and propping it up with many arguments anyway.
lynx97
[dead]
leumon
Unfortunately the app is not compatible with Android 15.
You can get pretty close with open source software:
https://claudio.uk/posts/audiblez-v4.html