Skip to content(if available)orjump to list(if available)

DiffRhythm: Fast End-to-End Full-Length Song Generation with Latent Diffusion

vessenes

This is primarily architecturally interesting in my opinion. Output songs have unusual noticeable artifacts, and I would guess they become more noticeable the more you listen.

That said, wow. An end to end FAST architecture that can infer a 4.5 minute song in 10 seconds is a compelling thing. I didn’t see if we got open weights, but my guess is that this is not crazy challenging to train, and some v2/v3 versions of this are likely to be good-to-very-good.

jimbokun

Why would anyone work on something like this?

This benefits no one. You are automating an activity humans ENJOY doing. And maybe taking away opportunities for humans to do more of it.

Work on projects that are a net benefit to humanity.

ravi-delia

This is why we never should have invented the phonograph. People who want to listen to music can just buy a record, making it literally impossible for them to perform an activity humans ENJOY doing. Without it everyone would surely be making all their own music, and nothing valuable would be lost

vitiral

There are like 6 core activities that bind humans together: shared creation of food, myth and music; co habitation, protection, child rearing.

We've done these things ourselves for hundreds of thousands of years. As we are increasingly convinced to buy them for convenience we loose the very things that make us know our connectedness.

So ya, there are real problems caused by the convenience of technology

TylerE

The ability to record has led the greatest expansion I musical artistry in human history.

Ty it don’t think peasants were listening t to Bach, do you? Only the extraordinarily wealthy could afford to have music as anything like an every day thing.

mrob

Lots of common people did listen to Bach, because he wrote many works for church organ. Church attendance was almost universal, and even small churches had (small) pipe organs.

jononor

And with more affordable and easier-to-learn tools, the creation of music will be similarly made much more accessible? DAWs and virtual instruments running on regular laptop was one step, generative AI models will be another?

glenneroo

Related: why are programmers racing to make the perfect AI coding tool? It's an activity many programmers enjoy, and more importantly, if the pace continues, they will likely be automating themselves (or at least a large portion of programmers globally) out of a job.

Granted, many people are benefiting from these tools (myself included) but at some point a lot of us are going to have to find a new job (assuming the progression continues unabated), and I'm not sure what new jobs are going to exist when LLM coders replace many or most of us.

Kye

Generate track, extract the pieces I like, build a real track using those pieces with all the other samples I use.

nullpilot

Not everyone enjoys composing music, and for a large group of people paying an artist is not an option. There's a lot to critizise about current AI tech, saying of all things this has no net benefit seems like the wrong thing to call out, and incredibly short sighted for HN.

Juliate

You're not composing music with an AI generator either: you're pushing a button with a few, limited instructions, and expect something that rewards your perception of what makes good music for your intention.

If you don't enjoy composing music, just don't do it, and give it to someone who does, and has the experience/knowledge/culture/practice/gut to do it.

logicchains

>You're not composing music with an AI generator either: you're pushing a button with a few, limited instructions, and expect something that rewards your perception of what makes good music for your intention.

What an incredibly elitist, smug attitude. You're basically saying people only have the right to hear the music professionals think they should hear.

6stringmerc

So putting paid humans out of business is your position then? Please explain why you believe in the long sighted view AI reducing already poverty level wages to zero is beneficial.

hexomancer

Do you not see how your argument could be applied to steam engines putting human laborers out of work? Or computers putting (human) calculators out of work? Do you think inventing the steam machine or computers was a mistake too?

nullpilot

If you're trying to maximize employment, composers aren't the first, second, or tenth place to go looking. If you're trying to say artists will bleed income, they already have for decades, and will continue to. The ones that make a living out of it mostly get their income from live performances and merch, and maybe adtech on social media platforms.

By the same logic synthesizers shouldn't have been invented that allowed people to make advanced sounds without tediously learning an instrument first, consumers should remain priced out of microphones and editing software, etc.

Like I said, I am not trying to feign ignorance on the drawbacks of the tech which is very real and far from negligible. I am not a tech bro AI maximalist. I just do believe that hyperbole will not put the djinn back into the bottle, and pretending like there isn't a real market between nothing and paying or being a composer isn't adding anything to the conversation.

risyachka

In this particular case it is totally black and white. Prove me wrong.

Tell me one example how music gen in any way benefits anybody to the level that is worth putting out of business the last few artists that make ends meet?

taylorius

I don't think there's any stopping it, unfortunately. The internet is too good at "optimising" content. The future is Mr Beast, Instagram hotties and 6 pack guys, tiktok morons and onlyfans. Be happy, the market has spoken.

chefandy

People that never considered the value of artistic process until it was the topic du jour unilaterally decided that it was inefficient, oppressive, complex, frivolous, and unfairly inaccessible to those that hadn't put any sustained effort into developing theirs. If you didn't understand what they don't, you'd realize that companies spending billions of dollars to create tools that make cheap simulacra of artists' work to sell them at a loss to crush them in their own markets was merely the natural progression of artistic praxis. Despite it being economically unsustainable and clearly only cheap until it craters the value of artistic skill, these tools have democratized creativity. Instead of them only being available to those with the interest and willingness to practice and develop their artistic sense, process, and skill, they're now broadly available to anyone willing to pay money for a subscription service that will obviously soon be a hell of a lot more expensive, or shell out a few thousands dollars for a top-tier video card that you almost certainly already have in your gaming rig, anyway. This is silicon valley progress and if you don't like it, you're a communist.

Juliate

Totally with you. But it's the trend we get to re-balance in a good way:

> People that never considered the value of artistic process until it was the topic du jour unilaterally decided that it was inefficient, oppressive, complex, frivolous, and unfairly inaccessible to those that hadn't put any sustained effort into developing theirs.

This is eerily reminiscent of what's happening inside the USA government & administration today...

itishappy

Some humans honestly enjoy automating stuff. We wouldn't want to be taking away something that humans enjoy, would we?

I'm a musician myself, but I sadly suspect that most music made today "benefits humanity" very little... Is music making always a net positive? If nothing else, these tools will allow more music will be made.

SeanAnderson

Dynamic music generation for interactive media seems like a good reason?

logicchains

>You are automating an activity humans ENJOY doing.

There's at least an order of magnitude more people who enjoy making music than there are people with the actual skill/talent to make music. Music generation AI is an absolute blessing to the untalented among us who'd love to make a song in a certain style or with certain lyrics but lack the time, talent or ability to do it ourselves.

jnwatson

The style matching is interesting, but there's no song structure. There's no identifiable chorus in any of the demo songs.

impossiblefork

I find this very surprising, because it's one of things I'd have expected a diffusion model to have a chance of achieving.

I suppose it might because it's latent diffusion.

qoez

That can probably be a style in itself (if we kept exploring in these directions)

6stringmerc

No, it’s not a style, it’s by definition an incomplete song.

qoez

"Electronic music aren't real songs there's no real instruments involved". Let's be a bit creative with these tools. Sure the pure output isn't always plesant or listenable but there's probably an interesting genre to carve out here

01100011

Cool. Obviously needs some work. Lots of artifacts. Something to build on though.

Lots of sour grapes comments from folks. Too bad. Not what I expect out of Hacker News. Glad people are pushing the technological envelope and exploring this space despite the strong negative emotions.

bedane

the "prompt" is the 10 seconds original audio file + the lyrics, right?

absolutely crazy

SubiculumCode

If I am to retain any interest as an amateur music writer without proaudio engineering skills and equipment, but with a day job, , I want tools that help me enact MY vision to reality. That means multi tracking, ability to hum or score a melody and have it transfer to musical instrument, ability to enter existing tracks, provide a temporal segment for diffusion, and ask it to 'generate a counterpoint to the melody with strings, etc. The most exciting possibilities of this is enabling talented writers with day jobs, not one click song writing.

voxl

The people writing the one-shot tools are living a pipe dream and are riding the hype wave. One-shot AI music will have a short amount of interest based on its novelty, but the very next generation of humans will revolt against it as a cringe decision of the old guard. Form there it might finally be applied more realistically as an aid to human expression instead of a replacement.

soperj

I'd be surprised if the people who are writing these understand what this means.

naltroc

The request is valid; you just need the right tools for the job.

Story Jam lets you design chord progressions without needing to know about music theory, instead offering intuitive terms like "lightness", "darkness", "drifting" and "roaming". They mean about what you think they mean.

https://storyjam.tenpens.ink

I'm planning a "Show HN" post for tomorrow morning EST with more details. But you can get the sneak peek here :)

SubiculumCode

Yeah, I'd think that it will take commoditized generation tools that existing or new composition multi-tracking tools could incorporate. i.e. FLStudio plugin

null

[deleted]

fabiofzero

Business hates creatives. They'll do anything to automate us away.

foxbarrington

Business doesn’t hate creatives, and is not specifically targeting creatives to automate them away. Any job that can be done as good for a lower price or better for the same price is going to be a target.

kelseyfrog

And let's be honest, the reason it can be done for a lower price is because the public doesn't have taste.

onlyrealcuzzo

Put another way, they hate costs, and all of us are costs :)

VincentEvans

Whats the end game here?

Let’s follow the AI and automation craze to its eventual conclusion - automations everywhere, humans are either employed in automation industry, or are unemployed at a massive scale.

Stable jobs are replaced by ever-optimized gig economy for some, and chronic poverty for others. For there to even be economy - the massive underemployed population subsists on government welfare.

Cynic in me thinks that all of the wealth generated by enormous productivity gains resulting from automation will not find its way towards population displaced by it. Those cashiers, toll booth, and warehouse workers did not find themselves in much more lucrative careers - I don’t see why it will be any different for truck and cab drivers who will be joining them in the near future.

If you see a future where these people who suddenly found all this extra leisure time o. Their hands and no income - are somehow blossoming in creative directions and realizing their own potential - I’d like to have it painted for me, as it all looks pretty bleak to me. Just not quiet sure of the timeline.

Best I can come up with is an emergence of some kind of counter-cultural protest market where people buy and sell “made by humans” products, and are continuously attacked by various regulations originating from mega corporations who captured the government.

whyowhy3484939

That's right, they don't just hate creatives. They'll go after anyone.

I wonder what the hyper-capitalist's end game looks like. One giant company that covers everything with one man sitting at a dashboard, tweaking parameters? Is that one man even necessary?

I wonder what our plans are for when "the economy" prefers to do it's thing without us. Writing poems all day? What capitalist instrument will provide "money" for us to spend in this giant machine?

jimbokun

> One giant company that covers everything with one man sitting at a dashboard, tweaking parameters? Is that one man even necessary?

Old joke about airplane automation:

In the future there will be just one pilot and a dog in the cockpit. The dog is there to bite the pilot if he touches anything.

oortoo

I don't think its at all extremist to look at that picture, realize it won't really have made any sense for the majority of the people on the planet well before it gets to that point, and that consequently some type of major global revolution will prevent that from happening.

52-6F-62

There’s only one way to win in a game theory world and that’s to be on top at the end.

So where is it going? Why: the end.

But this is also where Gandolf says, “end?”

jimbokun

> Any job that can be done as good for a lower price or better for the same price is going to be a target.

So they just hate humanity in general then.

treyd

Yes, this has always been the case. This is why capital holders are actively hostile to labor organizing and tend to back fascism when liberalism falls into crisis.

behnamoh

whatever can be automated isn't "true" creativity. these models merely generate an average music, but the outputs of creative musicians always stand out.

perching_aix

If I was a business I'd "hate" creatives too, and I'd also want to automate them away. The costs of producing (truly) creative works is utterly bonkers, and so are the risks associated.

ramesh31

None of this is music. It is noise that sounds likes music. Pretty analogous to how AI slop is not information, but just words that are arranged to look like information.

Fucole

[dead]

52-6F-62

Lol. It won’t stop us from writing.

You know there’s something you really can’t steal from people no matter how hard you try

6stringmerc

It’s just combining sample WAV files without human coordination, talk about a lame-ass achievement. It’s already easy enough to set BPM and load in files in Ableton and warp them into unison, from what I heard this is basically just that with”HOORAY FOR AI” slathered as a veneer on top.

If you think I’m being harsh, I have my reasons as a professional musician to critique these things in an unflattering light because they are my competition. Thankfully actually “generated” AI music is trash. Copyright is problematic in the US, I admit, but tech bros using copyrighted material to train programs to put us out of business - without paying a penny which even Spotify doesn’t per stream - yeah, I’ll have some disdain about this scenario and I feel it’s justified.

Just because you can doesn’t mean you should.