Skip to content(if available)orjump to list(if available)

Spark-TTS: Text-2-Speech Model Single-Stream Decoupled Tokens [pdf]

mike978

smusamashah

The voices with Chinese origin when generated as English samples do sound like a Chinese person speaking English. It is very interesting.

vessenes

This is really quite good at sounding like Donald, especially for the first half of the audio. I’ll probably play around with this for a bit; it’s. It clear to me how much variation you can get in voice in latent space. Anyway it looks to be a very high quality (at least) short form tts engine with open weights so thanks team!

fdafds

[flagged]