Anthropic cut up millions of used books, and downloaded 7M pirated ones – judge
135 comments
·July 7, 2025bgwalter
Here is how individuals are treated for massive copyright infringement:
https://investors.autodesk.com/news-releases/news-release-de...
piker
I thought you'd go with this: https://en.wikipedia.org/wiki/United_States_v._Swartz
russell_h
The title is clearly meant to generate outrage, but what is wrong with cutting up a book that you own?
marapuru
Apparently it's a common business practice. Spotify (even though I can't find any proof) seems to have build their software and business on pirated music. There is some more in this Article [0].
https://torrentfreak.com/spotifys-beta-used-pirate-mp3-files...
Funky quote:
> Rumors that early versions of Spotify used ‘pirate’ MP3s have been floating around the Internet for years. People who had access to the service in the beginning later reported downloading tracks that contained ‘Scene’ labeling, tags, and formats, which are the tell-tale signs that content hadn’t been obtained officially.
techjamie
Crunchyroll was originally an anime piracy site that went legit and started actually licensing content later. They started in mid-2006, got VC funding in 2008, then made their first licensing deal in 2009.
https://www.forbes.com/2009/08/04/online-anime-video-technol...
https://venturebeat.com/business/crunchyroll-for-pirated-ani...
haiku2077
Good Old Games started out with the founders selling pirated games on disc at local markets.
dathinab
not just Spotify pretty much any (most?) current tech giant was build by
- riding a wave of change
- not caring too much about legal constraints (or like they would say now "distrupting" the market, which very very often means doing illigal shit which beings them far more money then any penalties they will ever face from it)
- or caring about ethics too much
- and for recent years (starting with Amazone) a lot of technically illegal financing (technically undercutting competitors prices long term based on money from else where (e.g. investors) is unfair competitive advantage (theoretically) clearly not allowed by anti monopoly laws. And before you often still had other monopoly issues (e.g. see wintel)
So yes not systematic not complying with law to get unfair competitive advantage knowing that many of the laws are on the larger picture toothless when applied to huge companies is bread and butter work of US tech giants
Workaccount2
The common meme is that megacorps are shamelessly criminalistic organizations that get away with doing anything they can to maximize profits, while true in some regard, totally pales in comparison to the illegal things small businesses and start-ups do.
pjc50
"recording obtained unofficially" and "doesn't have rights to the recording" are separate things. So they could well have got a license to stream a publisher's music but that didn't come with an actual copy of some/all of the music.
KoolKat23
There's plenty of startups gone legitimate.
Society underestimates the chasm that exists between an idea and raising sufficient capital to act on those ideas.
Plenty of people have ideas.
We only really see those that successfully cross it.
Small things EULA breaches, consumer licenses being used commercially for example.
hinterlands
The problem is that these "small things" are not necessarily small if you're an individual.
If you're an individual pirating software or media, then from the rights owners' perspective, the most rational thing to do is to make an example of you. It doesn't happen everyday, but it does happen and it can destroy lives.
If you're a corporation doing the same, the calculation is different. If you're small but growing, future revenues are worth more than the money that can be extracted out of you right now, so you might get a legal nastygram with an offer of a reasonable payment to bring you into compliance. And if you're already big enough to be scary, litigation might be just too expensive to the other side even if you answer the letter with "lol, get lost".
Even in the worst case - if Anthropic loses and the company is fined or even shuttered (unlikely) - the people who participated in it are not going to be personally liable and they've in all likelihood already profited immensely.
dathinab
but it's not some small things
but systematic wide spread big things and often many of them, giving US giant a unfair combative advantage
and don't think if you are a EU company you can do the same in the US, nop nop
but naturally the US insist that US companies can do that in the EU and complain every time a US company is fined for not complying for EU law
pyman
There's no credible evidence Spotify built their company and business on pirated music.
This is a narrative that gets passed around in certain circles to justify stealing content.
YPPH
"Stealing" isn't an apt term here. Stealing a thing permanently deprives the owner of the thing. What you're describing is copyright infringement, not stealing.
In this context, stealing is often used as a pejorative term to make piracy sound worse than it is. Except for mass distribution, piracy is often regarded as a civil wrong, and not a crime.
lmm
> There's no credible evidence Spotify built their company and business on pirated music.
That's a statement carefully crafted to be impossible to disprove. Of course they shipped pirated music (I've seen the files). Of course anyone paying attention knew. Nothing in the music industry was "clean" in those days. But, sure, no credible evidence because any evidence anyone shows you you'll decide is not credible. It's not in anyone's interests to say anything and none of it matters.
motbus3
They had a second company (which I don't remember the name) that allowed users to backup and share their music. When they were exposed they dug that as deep as they could
pyman
No. There's no credible evidence Spotify had any secret second company that allowed users to back up and share music without authorisation
pyman
It was the opposite. Their mission was to combat music piracy by offering a better, legal alternative.
Daniel Ek said: "my mission is to make music accessible and legal to everyone, while ensuring artists and rights holders got paid"
Also, the Swedish government has zero tolerance for piracy.
pyman
I know this might come as a shock to those living in San Francisco, but things are different in other parts of the world, like Uruguay, Sweden and the rest of Europe. From what I’ve read, the European committee actually cares about enforcing the law.
pyman
These are the people shaping the future of AI? What happened to all the ethical values they love to preach about?
We've held China accountable for counterfeiting products for decades and regulated their exports. So why should Anthropic be allowed to export their products and services after engaging in the same illegal activity?
ffsm8
> We've held China accountable for counterfeiting products for decades and regulated their exports
We have? Are we from different multi-verses?
The one I've lived in to date has not done anything against Chinese counterfeits beyond occasionally seizing counterfeit goods during import. But that's merely occasionally enforcing local counterfeit law, a far cry from punishing the entity producing it.
As a matter of fact, the companies started outsourcing everything to China, making further IP theft and quasi-copies even easier
Workaccount2
I was gonna say, the enforcement is so weak that it's not even really worth it to pursue consumer hardware here in the US. Make product that is a hit, patent it, and still 1 month later IYTUOP will be selling an identical copy for 1/3rd the price on Amazon.
benjiro
One rule for you, one rule for me ...
You never noticed the hypocrite behavior all over society?
* O, you drunk drive, big fine, lots of trouble. * O, you drunk drive and are a senator, cop, mayor, ... Well, lets look the other way.
* You have anger management issues and slam somebody to the ground. Jail time. * You as a cop have anger management issues and slams somebody to the ground. Well, paid time off while we investigate and maybe a reprimand. Qualified immunity boy!
* You tax fraud for 10k, felony record, maybe jail time. * You as a exec of a company do tax fraud for 100 million. After 10 years lawyering around, maybe you get something, maybe, ... o, here is a fine of 5 million.
I am sorry but the idea of everybody being equal under the law has always been a illusion.
We are holding China accountable for counterfeiting products because it hurts OUR companies, and their income. But when its "us vs us", well, then it becomes a bit more messy and in general, those with the biggest backing (as in $$$, economic value, and lawyers), tends to win.
Wait, if somebody steal my book, i can sue that person in court, and get a payout (lawyers will cost me more but that is not the point). If some AI company steals my book, well, the chance you win is close to 1%, simply because lots of well paid lawyers will make your winning hard to impossible.
Our society has always been based upon power, wealth and influence. The more you have of it, the more you get away (or reduced) with things, that gets other fined or jailed.
seydor
break things and move fast
lofaszvanitt
This is the underlying caste system coming to life right before your eyes :D.
stephenitis
I think caste system is the wrong analogy here.
Comment is more about the pseudo ethical high ground
MangoToupe
Companies being above the law does create a stratified system in this country for those who can benefit from said companies and those who cannot. Call it what you like.
ramon156
Pirate and pay the fine is probably hell of a lot cheaper than individually buying all these books. I'm not saying this is justified, but what would you have done in their situation?
Sayi "they have the money" is not an argument. It's about the amount of effort that is needed to individually buy, scan, process millions of pages. If that's done for you, why re-do it all?
pyman
The problem with this thinking is that hundreds of thousands of teachers who spent years writing great, useful books and sharing knowledge and wisdom probably won't sue a billion dollar company for stealing their work. What they'll likely do is stop writing altogether.
I'm against Anthropic stealing teacher's work and discouraging them from ever writing again. Some teachers are already saying this (though probably not in California).
lofaszvanitt
They won't be needed anymore, once singularity is reached. This might be their thought process. This also exemplifies that the loathed caste system found in India is indeed in place in western societies.
There is no equality, and seemingly there are worker bees who can be exploited, and there are privileged ones, and of course there are the queens.
SketchySeaBeast
> They won't be needed anymore, once singularity is reached.
And it just so happens that that belief says they can burn whatever they want down because something in the future might happen that absolves them of those crimes.
pyman
:D
Note: My definition of singularity isn't the one they use in San Francisco. It's the moment founders who stole the life's work of thousands of teachers finally go to prison, and their datacentres get seized.
Der_Einzige
Very few textbooks or anything that teachers write is “great”. Likely the best work a teacher ever wrote was called “pedagogy of the oppressed” which basically says that non communist revolutionary teachers should have their backs put against the wall.
Less slop from one of the most tyrannical class of people who have far too much unwarranted power in this society is good.
“Those who cannot do, teach.”
CuriouslyC
If you care so little about writing that AI puts you off it, TBH you're probably not a great writer anyhow.
Writers that have an authentic human voice and help people think about things in a new way will be fine for a while yet.
4b11b4
Yeah, people will still want to write. They might need new ways to monetize it... that being said, even if people still want to write they may not consider it a viable path. Again, have to consider other monetization.
glimshe
That will be sad, although there will still be plenty of great people who will write books anyway.
When it comes to a lot of these teachers, I'll say, copyright work hand in hand with college and school course book mandates. I've seen plenty of teachers making crazy money off students' backs due to these mandates.
A lot of the content taught in undergrad and school hasn't changed in decades or even centuries. I think we have all the books we'll ever need in certain subjects already, but copyright keeps enriching people who write new versions of these.
TimorousBestie
150K per work is the maximum fine for willful infringement (which this is).
105B+ is more than Anthropic is worth on paper.
Of course they’re not going to be charged to the fullest extent of the law, they’re not a teenager running Napster in the early 2000s.
voxic11
Even if they don't qualify for willful infringement damages (lets say they have a good faith belief their infringement was covered by fair use) the standard statutory damages for copyright infringement are $750-$30,000 per work.
glimshe
Isn't "pirating" a felony with jail time, though? That's what I remember from the FBI warning I had to see at the beginning of every DVD I bought (but not "pirated" ones).
voxic11
Yes criminal copyright infringement (willful copyright infringement done for commercial gain or at a large scale) is a felony.
pyman
Absolutely.
Pirating 7 million books, remixing their content, and using that to make money on Claude.ai is like counterfeiting 7 million branded products and selling them on your Shopify website. The original creators don't get payment, and someone's profiting off their work. Try doing that yourself and you'd get a knock on the door real quick.
dmix
A court just ruled on Anthropic and said an LLM response wasn't a form of counterfeiting (ie, essentially selling pirate books on the black market). Although tbf that is the most radical interpretation still being put forward by the lawyers of publishers like NYTimes, despite the obvious flaws.
mystified5016
No, it isn't.
tmaly
At minimum they should have to buy the book they are deriving weights from.
maeln
If you wanted to be legit with 0 chance of going to court, you would contact publisher and ask to pay a license to get access to their catalog for training, and negotiate from that point.
This is what every company using media are doing (think Spotify, Netflix, but also journal, ad agency, ...). I don't know why people in HN are giving a pass to AI company for this kind of behavior.
ohashi
Because they are mostly software developers who think it's different because it impacts them.
pyman
100%
It's the new narrative in certain circles, especially in San Francisco: it's us vs China. We're doing all this to beat them, no matter the cost. While teachers are left scratching their heads with four kids to feed.
edgineer
The paradigm is that teachers will teach life skills like public speaking and entrepreneurship. Book smarts that can be more effectively taught by AI will be, once schools catch up.
darkoob12
This is not about paying for a single copy. It would still be wrong even if they have bought every single one of those books. It is a form of plagiarism. The model will use someone else's idea without proper attribution.
jeroenhd
Legally speaking, we don't know that yet. Early signs are pointing at judges allowing this kind of crap because it's almost impossible for most authors to point out what part of the generated slop was originally theirs.
suyjuris
Just downloading them is of course cheaper, but it is worth pointing out that, as the article states, they did also buy legitimate copies of millions of books. (This includes all the books involved in the lawsuit.) Based on the judgement itself, Anthropic appears to train only on the books legitimately acquired. Used books are quite cheap, after all, and can be bought in bulk.
kevingadd
Google did it the legal way with Google Books, didn't they?
pyman
No, Google did not sell the books through Google Books. Anthropic is selling the transformed version of the books on claude.ai.
Pirating 7 million books, remixing their content, and using that to make money on Claude.ai is like counterfeiting 7 million branded products and selling them on your Shopify website. The original creators don't get payment, and someone's profiting off their work.
suyjuris
The judge appears to disagree with you on this. They found that training and selling an LLM are fair use, based on the fact that it is exceedingly transformative, and that the copyright holders are not entitled to any profits thereof due to copyright. (They also did get paid — Anthropic acquired millions of books legally, including all of the authors in this complaint. This would not retroactively absolve them of legal fault for past infringements, of course.)
pyman
Anthropic's cofounder, Ben Mann, downloaded million copies of books from Library Genesis in 2021, fully aware that the material was pirated.
Stealing is stealing. Let's stop with the double standards.
dathinab
stealing with the intent to gain a unfair marked advantage so that you can effectively kill any ethically legally correctly acting company in a way which is very likely going to hurt many authors through the products you create is far worse then just stealing for personal use
that isn't "just" stealing, it's organized crime
originalvichy
At least most pirates just consume for personal use. Profiting from piracy is a whole other level beyond just pirating a book.
mrcwinn
> At least most pirates just consume for personal use.
Easy for the pirate to say. Artists might argue their intent was to trade compensation for one's personal enjoyment of the work.
pyman
Someone on Twitter said: "Oh well, P2P mp3 downloads, although illegal, made contributions to the music industry"
That's not what's happening here. People weren't downloading music illegally and reselling it on Claude.ai. And while P2P networks led to some great tech, there's no solid proof they actually improved the music industry.
Imustaskforhelp
I really feel as if Youtube is the best sort of convenience for music videos where most people watch ads whereas some people can use an ad blocker.
I use an adblocker and tbh I think so many people on HN are okay with ad blocking and not piracy when basically both just block the end user from earning money.
I kind of believe that if you really like a software, you really like something. Just ask them what their favourite charity is and donate their or join their patreon/a direct way to support them.
drcursor
Let's not forget Spotify ;)
https://gizmodo.com/early-spotify-was-built-on-pirated-mp3-f...
mnky9800n
I feel like profit was always a central motive of pirates. At least from the historical documents known as, "The Pirates of the Caribbean".
KoolKat23
This isn't really profiting from piracy. They don't make money off the raw input data. It's no different to consuming for personal use.
They make money off the model weights, which is fair use (as confirmed by recent case law).
j_w
This is absurd. Remove all of the content from the training data that was pirated and what is the quality of the end product now?
Der_Einzige
Information wants to be free.
x3n0ph3n3
Copyright infringement is not stealing.
impossiblefork
It's very similar to theft of service.
There's so many texts, and they're so sparse that if I could copyright a work and never publish it, the restriction would be irrelevant. The probability that you would accidentally come upon something close enough that copyright was relevant is almost infinitesimal.
Because of this copyright is an incredibly weak restriction, and that it is as weak as it is shows clearly that any use of a copyrighted work is due to the convenience that it is available.
That is, it's about making use of the work somebody else has done, not about that restricting you somehow.
Therefore copyright is much more legitimate than ordinary property. Ordinary property, especially ownership of land, can actually limit other people. But since copyright is so sparse infringing on it is like going to world with near-infinite space and picking the precise place where somebody has planted a field and deciding to harvest from that particular field.
Consequently I think copyright infringement might actually be worse than stealing.
pyman
Pirating a book and selling it on claude.ai is stealing, both legally and morally.
BlackFly
Making a copy differs from taking an existing object in all aspects: literally, technically, legally and ethically. Piracy is making a copy you have no legal right to. Stealing is taking a physical object that you have no legal right to. While the "no legal right to" seems the same superficially, in practice the laws differ quite a bit because the literal, technical and ethical aspects differ.
TiredOfLife
They are not selling it on claude.ai. If you can prove that they are you will be rich.
zb3
Who got robbed? Just because I'd pay for AI it doesn't mean I'd buy these books.
seydor
property infringement isn't either?
1oooqooq
actually, the Only time it's a (ethical) crime is when a corporation does it at scale for profit.
damnesian
oh well, the product has a cute name and will make someone a billionaire, let's just give it the green light. who cares about copyright in the age of AI?
dathinab
as far as I understand while training on books is clearly not fair use (as the result will likely hurt the lively hood of authors, especially not "best of the best" authors).
as long as you buy the book it still should be legal, that is if you actually buy the book and not a "read only" eBook
but the 7_000_000 pirated books are a huge issue, and one from which we have a lot of reason to believe isn't just specific to Anthropic
tliltocatl
If the AI movement will manage to undermine Imaginary Property, it would redeem it's externalities threefold.
57473m3n7Fur7h3
I don’t think that’s gonna happen. I think they will manage to get themselves out of trouble for it, while the rest of us will still face serious problems if we are caught torrenting even one singular little book.
2OEH8eoCRo0
The Ocean Full of Bowling Balls
tliltocatl
Even so, would be hard to prove that this particular little book wasn't generated by Claude (oopsie, it happens to be a verbatim copy of a copyrighted work, that happens sometimes, those pesky LLMs).
pyman
You just need to audit their system. Shouldn't take more than a couple of hours.
ttoinou
It would be great, but I think some are worried that new AI BigTech will find a way to continue enforcing IP on the rest of society while it won't exist for them
Imustaskforhelp
I think that we are worried because I think that's exactly what's going to happen/ is happening.
pxc
It's true that intellectual property is a flawed and harmful mechanism for supporting creative work, and it needs to change, but I don't think ensuring a positive outcome is as simple as this. Whether or not such a thing benefits the public rather than just AI companies will be largely incidental.
I support radical "intellectual property" reform. But my highest hopes for this situation are more modest: if AI companies are told that their data must be in the public domain to train against, we will finally have a powerful faction among capitalists with a strong incentive to push back against the copyright monopolists when it comes to the indefinite extension of copyright.
If the "path of least resistance" for companies like Google, Microsoft, and Meta becomes enlarging the public domain, we might finally begin to address the stagnation of the public domain, and that could be a good thing.
But I think even such a modest hope as that one is unlikely to he realized. :-\
bayindirh
What are your feelings about how the small fish is stripped of their arts, and their years of work becomes just a prompt? Mainly comic artists and small musicians who are doing things they like and putting out for people, but not for much money?
spankibalt
[flagged]
tliltocatl
"But think about the children". The copyright system is doing too much damage to culture and society. Yes, it does provides a pond for some small fish, but the overall damage outweighs this. Like the fact that first estate provided sustainable for arts and crafts to flourish doesn't make the ancient régime any less screwed up.
bayindirh
I think I have worded my question wrong. I asked about not about how AI affects the financials of these smaller artists, but their wellbeing in general.
There are many small artists who do this not for money, but for fun and have their renowned styles. Even their styles are ripped off by these generative AI companies and turned into a slot machine to earn money for themselves. These artists didn't consent to that, and this affects their (mental) well-beings.
With that context in mind, what do you think about these people who are not in this for money is ripped out of their years of achievement and their hard work exploited for money by generative AI companies?
It's not about IP (with whatever expansion you prefer) or laws, but ethics in general.
Substitute comics for any medium. Code, music, painting, illustration, literature, short movies, etc.
karel-3d
That would render GPL and friends redundant too... copyleft depends on copyright.
LtWorf
It will undermine it only for the rich owner of AI companies, not for everyone.
Der_Einzige
Yup.
My response to this whole thread is just “good”
Aaron Swartz is a saint and a martyr.
null
motbus3
It is shocking how courts have being ruling towards the benefits of ai companies despite the obvious problem of allowing automatic plagiarism
sidewndr46
So using the standard industry metrics for calculating the financial impact of piracy, this would equate to something like trillions of damages to the book publishing industry?
Kim_Bruning
actual title:
"Anthropic cut up millions of used books to train Claude — and downloaded over 7 million pirated ones too, a judge said."
A not-so-subtle difference.
That said, in a sane world, they shouldn't have needed to cut up all those used books yet again when there's obviously already an existing file that does all the work.
https://archive.md/YLyPg