Skip to content(if available)orjump to list(if available)

Backing Up Spotify

Backing Up Spotify

65 comments

·December 20, 2025

Etheryte

To put this into perspective, What.CD [0] was widely considered to be the music library of Alexandria, unparalleled in both its high quality standard and it's depth. What had in the ballpark of a few million torrents when it got raided and shut down. Anna's rip of Spotify includes roughly 186 million unique records. Granted, the tail end is a mixed bag of bot music and whatnot, but the scale is staggering.

[0] https://en.wikipedia.org/wiki/What.CD

VanTheBrand

True but What.cd had a tremendous amount of notable music not available on Spotify though because it was also sourced from cds, bootlegs, vinyl, tape etc whereas Spotify only includes music explicitly licensed for streaming.

Etheryte

This is true and a category of music that got hit notably hard was live recordings. What had a wide array of live recordings made by sound engineers straight from the mixer. This is something that you simply cannot find now unless you maybe know a guy.

null

[deleted]

crazygringo

This is insane.

I definitely was not aware Spotify DRM had been cracked to enable downloading at scale like this.

The thing is, this doesn't even seem particularly useful for average consumers/listeners, since Spotify itself is so convenient, and trying to locate individual tracks in massive torrent files of presumably 10,000's of tracks each sounds horrible.

But this does seem like it will be a godsend for researchers working on things like music classification and generation. The only thing is, you can't really publicly admit exactly what dataset you trained/tested on...?

Definitely wondering if this was in response to desire from AI researchers/companies who wanted this stuff. Or if the major record labels already license their entire catalogs for training purposes cheaply enough, so this really is just solely intended as a preservation effort?

Aurornis

> The thing is, this doesn't even seem particularly useful for average consumers/listeners, since Spotify itself is so convenient, and trying to locate individual tracks in massive torrent files of presumably 10,000's of tracks each sounds horrible.

I wouldn’t be so sure. There are already tools to automatically locate and stream pirated TV and movie content automatic and on demand. They’re so common that I had non-technical family members bragging at Thanksgiving about how they bought at box at their local Best Buy that has an app which plays any movie or TV show they want on demand without paying anything. They didn’t understand what was happening, but they said it worked great.

> Definitely wondering if this was in response to desire from AI researchers/companies who wanted this stuff.

The Anna’s archive group is ideologically motivated. They’re definitely not doing this for AI companies.

crazygringo

> The Anna’s archive group is ideologically motivated.

Very interesting, thank you. So using this for AI will just be a side effect.

And good point -- yup, can now definitely imagine apps building an interface to search and download. I guess I just wonder how seeding and bandwidth would work for the long tail of tracks rarely accessed, if people are only ever downloading tiny chunks.

nutjob2

I think the people seeding these are also ideologs and so would be interested in also supporting the obscure stuff, maybe more than the popular. There is no way any casual listeners would go to the quite substantial trouble of using these archives.

Anyone who wants to listen to unlimited free music from a vast catalog with a nice interface can use YouTube/Google Music. If they don't like the ads they can get an ad blocker. Downloading to your own machine works well too.

5-

> The Anna’s archive group is ideologically motivated. They’re definitely not doing this for AI companies.

https://en.wikipedia.org/wiki/Useful_idiot

ronsor

They know about AI companies and don't mind AI companies, but they're not doing it because AI companies.

VanTheBrand

The metadata is probably more useful than the music files themselves arguably

basisword

>> But this does seem like it will be a godsend for researchers working on things like music classification and generation. The only thing is, you can't really publicly admit exactly what dataset you trained/tested on...?

Didn't Meta already publicly admit they trained their current models on pirated content? They're too big to fail. I look forward to my music Slop.

VanTheBrand

They are too big to fail but they aren’t too big to have to pay out a huge settlement. Facebook annual revenue is about it twice that of the entire global recording industry. The strategy these companies took was probably correct but that calculation included the high risk of ultimately having to pay out down the line. Don’t mistake their current resistance to paying for an internal belief they never will have to.

WD-42

Incredible.

> A while ago, we discovered a way to scrape Spotify at scale.

They wont and shouldn’t divulge the details, but I imagine that would be a fun read!

bmikaili

they're probably just using something like https://github.com/nor-dee/spotizerr-spotify

WD-42

No way, that would take far too long.

bigyabai

Probably not, those tools don't actually download Spotify tracks at source quality.

sunaookami

There are tools that actually download directly from Spotify (needs premium then) but yeah most of them just use the search and download from other sources like YouTube without mentioning it. I won't say which tools download directly out of fear that they get killed but they exist.

syntaxing

Moral and legal discussion aside, this is technically very impressive. I also wouldn’t be surprised if this somehow kickstarts open source music generative AI from China.

yegle

Not that we should, but it's technically feasible to have a music streaming server with the torrent as the backend, and selectively download the part of the torrent in respond to on-demand streaming request from the client.

yellow_lead

Is the music torrent not up yet? Only see the metadata one here: https://annas-archive.li/torrents/spotify

artninja1988

Yeah, in the article they write:

The data will be released in different stages on our Torrents page:

[X] Metadata (Dec 2025)

[ ] Music files (releasing in order of popularity)

[ ] Additional file metadata (torrent paths and checksums)

[ ] Album art

[ ] .zstdpatch files (to reconstruct original files before we added embedded metadata)

yellow_lead

Oh I see, thanks! I missed that

frereubu

Site is down for me. Archive link: https://archive.is/jf3HW

mawax

Probably not down, but blocked by your ISP. Try a VPN. Same thing happens here.

ipsum2

Ironic. But its working for me.

Fizzadar

I have Spotify premium but the constant shuffle of content availability has meant I’ve stared routinely archiving my liked songs to avoid any rug pull. Zspotify and co still work a charm.

xnx

Merry Christmas!

ipsum2

Can someone explain why C#/Db (major/minor) is the third most popular key? Very unexpected for me, since its relatively more difficult to play.

ghostie_plz

Both C#m and Db can be played on piano using only the black keys (skipping the 3rd note of the scale). This makes them easy keys for beginners. I'm not sure if that's the reason, but it could be related.

Anecdotally, I know a few vocalists that sound great in these keys and use them as a starting point

kzrdude

Electronic dance music is the biggest genre in the data. So then easy to play shouldn't matter. It's still an interesting question. I think playing Db is pretty nice on the piano even if it's not the easiest.

klysm

Difficult to play in what instrument?

lelouch9099

How legal is this with regards to copyright laws?

Aurornis

Not legal. This group does not concern themselves with copyright law.

toomuchtodo

Adherence to the legal framework is a function of your risk appetite.

phainopepla2

Not legal

ronsor

Very, if we delete copyright like we're supposed to.

basisword

It's not. It's awful people justifying awful behaviour. And it's why we can't have nice things. There are always assholes ready to exploit others.

jopicornell

Monopoly is not a nice thing. Maybe it is convenient, but not nice.

People that gives money to artists are the ones going to concerts and buying music directly to artists. Spotify gives cents to artists, incetivizing awful behaviour (AI music, aggressive marketing, low effort art...).

nemomarx

There's some irony here considering Spotify used pirated mp3s at the start of their operations, I suppose.

poly2it

Some people's urges to destroy all traces of human civilisation astonish me. What do you think Spotify is going to do with all its music when it ceases to exist in however many years? No, we must collectively feed Daniel Ek the Hungry.

827a

Holy crap. This is going to trigger a five-alarm fire at Spotify Engineering. This has got to be among the largest proprietary datasets ever unintentionally publicized by a company.

rightbyte

Wasn't all data available to users though?