Creates hyper-realistic voice clones from just 3 seconds of audio
28 comments
·January 10, 2025croemer
The title is editorialized, it should be something like: "Anyvoice - AI Voice Cloning"
croemer
Getting error: Failed to generate voice
HeatrayEnjoyer
I am hitting this error as well. I was additionally unable to create an account. Seems beta?
blacktechnology
fixed
esperent
We've been advertising to get someone to take over the lease on a commercial building. Surprisingly, we've had several of what seem like very obvious scam attempts - people stringing us along, not trying to bargain (we are in a haggling country, people always try to bargain), asking us to wait unreasonable amounts of time, and finally when pressed breaking down into logical inconsistencies. So, not even good scam attempts.
I was wondering, what's the point? I mean, it's a building. You pay money, you sign the lease (in person), you get the use of the building. No money, no building. Where's the scam opportunity?
The only thing I can think of is that they're trying to get enough data and personal info to clone our voices and use that to try and gain access to bank accounts or to scam our relatives. Even if I'm wrong in this case, this seems like a major new vulnerability in society. I mean, if someone who sounded (and with video AI, perhaps even looked) exactly like me called up my mother and pretended I'd been violently robbed or had an accident, she'd transfer money in a heartbeat.
I'm considering that I should set up some kind of code system with my family for this. As in, if I ever end up in a situation where I need help, I'll use a particular code phrase. If I don't know it, assume it's an AI clone.
0x20cowboy
> I'm considering that I should set up some kind of code system with my family for this. As in, if I ever end up in a situation where I need help, I'll use a particular code phrase.
You absolutely should. And include something for videos (like FaceTime calls). Especially if these members of your family are boomers+
ge96
3 seconds? That's crazy
"Huuhhhhhhhhhhh"
I wonder what their "fox jump" sentence is
mk_stjames
A "Panphonic Poem" is what may do well here. As in...
The pleasure of Shawn’s company
Is what I most enjoy.
He put a tack on Ms. Yancey’s chair
When she called him a horrible boy.
At the end of the month he was flinging two kittens
Across the width of the room.
I count on his schemes to show me a way now
Of getting away from my gloom.
As discussed here:https://literalminded.wordpress.com/2006/05/05/a-panphonic-p...
And recited very famously, in part and slightly modified, here:
sailfast
Default for me was: “What a beautiful day it is today, with bright sunshine and gentle breeze. Let's talk about the future of artificial intelligence.”
That said, I'm not going to be submitting a sample because [reasons]
null
bugglebeetle
Sure, just let me submit my voice for cloning to a closed sourced, online service of unknown provenance. What could ever go wrong?
dvh
That's why you submit politician's voice instead
dunham
It would be fun to have a clone of Majel Barrett's voice for something like Siri or Alexa.
HanClinto
Yeah, but they have you read a specific text, so not as much of an option if you use the primary demo.
Seems like a heck of a nice way to gather a training set! :)
unsnap_biceps
The "upload audio" feature doesn't require any specific text.
lubujackson
Cue reference to "Sneakers"...
superkuh
I submitted an 8 second clip of speech and the resulting synthesized speech did not sound like the same voice. Too bad.
infogulch
I hope you have a nice voice, I'll be listening to it try to sell me an extended car warranty for the next 3 months.
krainboltgreene
Getting a 500 from the HTTP API and also there's an `debugger` in the javascript.
gamblor956
This was a great way for them to collect a lot of free voice data to train their model.
inerte
Every time there's a voice recognition post here someone comments about acquiring data. Why is this method better than having access to all of the video and podcasts sites on the internet?
rahimnathwani
You can get people to utter the same sentence.
clueless
anybody try this and have a good result?
What model is this using? I've had good results with e2-ft-tts running locally via Pinokio. You can also run it online for free https://huggingface.co/spaces/mrfakename/E2-F5-TTS