Anthropic cut up millions of used books, and downloaded 7M pirated ones – judge
263 comments
·July 7, 2025Uhhrrr
From Vinge's "Rainbow's End":
> In fact this business was the ultimate in deconstruction: First one and then the other would pull books off the racks and toss them into the shredder's maw. The maintenance labels made calm phrases of the horror: The raging maw was a "NaviCloud custom debinder." The fabric tunnel that stretched out behind it was a "camera tunnel...." The shredded fragments of books and magazine flew down the tunnel like leaves in tornado, twisting and tumbling. The inside of the fabric was stitched with thousands of tiny cameras. The shreds were being photographed again and again, from every angle and orientation, till finally the torn leaves dropped into a bin just in front of Robert. Rescued data. BRRRRAP! The monster advanced another foot into the stacks, leaving another foot of empty shelves behind it.
dehrmann
The important parts:
> Alsup ruled that Anthropic's use of copyrighted books to train its AI models was "exceedingly transformative" and qualified as fair use
> "All Anthropic did was replace the print copies it had purchased for its central library with more convenient space-saving and searchable digital copies for its central library — without adding new copies, creating new works, or redistributing existing copies"
It was always somewhat obvious that pirating a library would be copyright infringement. The interesting findings here are that scanning and digitizing a library for internal use is OK, and using it to train models is fair use.
6gvONxR4sf7o
You skipped quotes about the other important side:
> But Alsup drew a firm line when it came to piracy.
> "Anthropic had no entitlement to use pirated copies for its central library," Alsup wrote. "Creating a permanent, general-purpose library was not itself a fair use excusing Anthropic's piracy."
That is, he ruled that
- buying, physically cutting up, physically digitizing books, and using them for training is fair use
- pirating the books for their digital library is not fair use.
jonas21
As they mentioned, the piracy part is obvious. It's the fair use part that will set an important precedent for being able to train on copyrighted works as long as you have legally acquired a copy.
throwawayffffas
So all they have to do is go and buy a copy of each book they pirated. They will have ceased and desisted.
superfrank
I'm trying to find the quote, but I'm pretty sure the judge specifically said that going and buying the book after the fact won't absolve them of liability. He said that for the books they pirated they broke the law and should stand trial for that and they cannot go back and un-break in by buying a copy now.
Found it: https://www.nbcnews.com/tech/tech-news/federal-judge-rules-c...
> “That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft,” [Judge] Alsup wrote, “but it may affect the extent of statutory damages.”
dragonwriter
> So all they have to do is go and buy a copy of each book they pirated.
No, that doesn't undo the infringement. At most, that would mitigate actual damages, but actual damages aren't likely to be important, given that statutory damages are an alternative and are likely to dwarf actual damages. (It may also figure into how the court assigns statutory damages within the very large range available for those, but that range does not go down to $0.)
> They will have ceased and desisted.
"Cease and desist" is just to stop incurring additional liability. (A potential plaintiff may accept that as sufficient to not sue if a request is made and the potential defendant complies, because litigation is uncertain and expensive. But "cease and desist" doesn't undo wrongs and neutralize liability when they've already been sued over.)
jpalawaga
I don't think that's new. google set precedent for that more than a decade ago. you're allowed to transform a book to digital.
bgwalter
Here is how individuals are treated for massive copyright infringement:
https://investors.autodesk.com/news-releases/news-release-de...
piker
I thought you'd go with this: https://en.wikipedia.org/wiki/United_States_v._Swartz
dialup_sounds
Swartz wasn't charged with copyright infringement.
arandomhuman
No but he coincidentally passed away after he was accused of it.
natch
*technically
JimDabell
> illegally copying and selling pirated software
This is very different to what Anthropic did. Nobody was buying copies of books from Anthropic instead of the copyright holder.
rvnx
At the very least, they should have purchased the originals once
arandomhuman
Yeah, people have gone to jail for a few copies of content. Taking that large of a corpus and getting off without penalty would be a farce of the justice system.
farceSpherule
Peterson was copying and selling pirated software.
Come up with a better comparison.
organsnyder
Anthropic is selling a service that incorporates these pirated works.
adolph
That a service incorporating the authors' works exists is not at issue. The plaintiffs' claims are, as summarized by Alsup:
First, Authors argue that using works to train Claude’s underlying LLMs
was like using works to train any person to read and write, so Authors
should be able to exclude Anthropic from this use (Opp. 16).
Second, to that last point, Authors further argue that the training was
intended to memorize their works’ creative elements — not just their
works’ non-protectable ones (Opp. 17).
Third, Authors next argue that computers nonetheless should not be
allowed to do what people do.
https://media.npr.org/assets/artslife/arts/2025/order.pdfnh23423fefe
What point are you making? 20 years ago, someone sold pirated copies of software (wheres the transformation here) and that's the same as using books in a training set? Judge already said reading isnt infringement.
This is reaching at best.
chourobin
copyright is not the same as piracy
asadotzler
piracy isn't a thing, except on the high seas. what you're thinking about is copyright violation.
downrightmike
Yup, piracy sounds better than copyright violation.
“Piracy” is mostly a rhetorical term in the context of copyright. Legally, it’s still called infringement or unauthorized copying. But industries and lobbying groups (e.g., RIAA, MPAA) have favored “piracy” for its emotional weight.
achierius
Can you explain why? What makes them categorically different or at the very least why is "piracy" quantitatively worse than 'just' copyright violation?
arrosenberg
Piracy is theft - you have taken something and deprived the original owner of it.
Copyright infringement is unauthorized reproduction - you have made a copy of something, but you have not deprived the original owner of it. At most, you denied them revenue although generally less than the offended party claims, since not all instances of copying would have otherwise resulted in a sale.
abeppu
Maybe the most memorable version of the response is this the "Copying is not Theft" song. https://www.youtube.com/watch?v=IeTybKL1pM4
NoMoreNicksLeft
Asked unironically: "What's worse, hijacking ships at sea and holding their crews hostage for ransom on threat of death, or downloading a song off the internet?" ...
charcircuit
Saying that piracy isn't copyright violation is an RMS talking point. It's not worth trying to ask why because the answer will be RMS said so and will not be backed by the common usage of the word.
codedokode
If AI companies are allowed to use pirated material to create their products, does it mean that everyone can use pirated software to create products? Where is the line?
Also please don't use word "learning", use "creating software using copyrighted materials".
Also let's think together how can we prevent AI companies from using our work using technical measures if the law doesn't work?
rvnx
~1B USD in cash is the line where laws apply very differently
redcobra762
It's abusive and wrong to try and prevent AI companies from using your works at all.
The whole point of copyright is to ensure you're paid for your work. AI companies shouldn't pirate, but if they pay for your work, they should be able to use it however they please, including training an LLM on it.
If that LLM reproduces your work, then the AI company is violating copyright, but if the LLM doesn't reproduce your work, then you have not been harmed. Trying to claim harm when you haven't been due to some philosophical difference in opinion with the AI company is an abuse of the courts.
codedokode
It is not wrong at all. The author decides what to do with their work. AI companies are rich and can simply buy the rights or hire people to create works.
I could agree with exceptions for non-commercial activity like scientific research, but AI companies are made for extracting profits and not for doing research.
> AI companies shouldn't pirate, but if they pay for your work, they should be able to use it however they please, including training an LLM on it.
It doesn't work this way. If you buy a movie it doesn't mean you can sell goods with movie characters.
> then you have not been harmed.
I am harmed because less people will buy the book of they can simply get an answer from LLM. Maybe instead of books we should start making applications that protect the content and do not allow copying text or making screenshots.
guywithahat
If you own a book, it should be legal for your computer to take a picture of it. I honestly feel bad for some of these AI companies because the rules around copyright are changing just to target them. I don't owe copyright to every book I read because I may subconsciously incorporate their ideas into my future work.
Bjorkbat
Something missed in arguments such as these is that in measuring fair use there's a consideration of impact on the potential market for a rightsholder's present and future works. In other words, can it be proven that what you are doing is meaningfully depriving the author of future income.
Now, in theory, you learning from an author's works and competing with them in the same market could meaningfully deprive them of income, but it's a very difficult argument to prove.
On the other hand, with AI companies it's an easier argument to make. If Anthropic trained on all of your books (which is somewhat likely if you're a fairly popular author) and you saw a substantial loss of income after the release of one of their better models (presumably because people are just using the LLM to write their own stories rather than buy your stuff), then it's a little bit easier to connect the dots. A company used your works to build a machine that competes with you, which arguably violates the fair use principle.
Gets to the very principle of copyright, which is that you shouldn't have to compete against "yourself" because someone copied you.
parliament32
> a consideration of impact on the potential market for a rightsholder's present and future works
This is one of those mental gymnastics exercises that makes copyright law so obtuse and effectively unenforceable.
As an alternative, imagine a scriptwriter buys a textbook on orbital mechanics, while writing Gravity (2013). A large number of people watch the finished film, and learn something about orbital mechanics, therefore not needing the textbook anymore, causing a loss of revenue for the textbook author. Should the author be entitled to a percentage of Gravity's profit?
We'd be better off abolishing everything related to copyright and IP law alltogether. These laws might've made sense back in the days of the printing press but they're just nonsensical nowadays.
raincole
Are we reading the same article? The article explicitly states that it's okay to cut up and scan the books you own to train a model from them.
> I honestly feel bad for some of these AI companies because the rules around copyright are changing just to target them
The ruling would be a huge win for AI companies if held. It's really weird that you reached the opposite conclusion.
atomicnumber3
The core problem here is that copyright already doesn't actually follow any consistent logical reasoning. "Information wants to be free" and so on. So our own evaluation of whether anything is fair use or copyrighted or infringement thereof is always going to be exclusively dictated by whatever a judge's personal take on the pile of logical contradictions is. Remember, nominally, the sole purpose of copyright is not rooted in any notions of fairness or profitability or anything. It's specifically to incentivize innovation.
So what is the right interpretation of the law with regards to how AI is using it? What better incentivizes innovation? Do we let AI companies scan everything because AI is innovative? Or do we think letting AI vacuum up creative works to then stochastically regurgitate tiny (or not so tiny) slices of them at a time will hurt innovation elsewhere?
But obviously the real answer here is money. Copyright is powerful because monied interests want it to be. Now that copyright stands in the way of monied interests for perhaps the first time, we will see how dedicated we actually were to whatever justifications we've been seeing for DRM and copyright for the last several decades.
organsnyder
The difference here is that an LLM is a mechanical process. It may not be deterministic (at least, in a way that my brain understands determinism), but it's still a machine.
What you're proposing is considering LLMs to be equal to humans when considering how original works are created. You could make the argument that LLM training data is no different from a human "training" themself over a lifetime of consuming content, but that's a philosophical argument that is at odds with our current legal understanding of copyright law.
kevinpet
That's not a philosophical argument at odds with our current understanding of copyright law. That's exactly what this judge found copyright law currently is and it's quoted in the article being discussed.
null
rapind
Everything is different at scale. I'm not giving a specific opinion on copyright here, but it just doesn't make sense when we try to apply individual rights and rules to systems of massive scale.
I really think we need to understand this as a society and also realize that moneyed interests will downplay this as much as possible. A lot of the problems we're having today are due to insufficient regulation differentiating between individuals and systems at scale.
zerotolerance
"Judge says training Claude on books was fair use, but piracy wasn't."
marapuru
Apparently it's a common business practice. Spotify (even though I can't find any proof) seems to have build their software and business on pirated music. There is some more in this Article [0].
https://torrentfreak.com/spotifys-beta-used-pirate-mp3-files...
Funky quote:
> Rumors that early versions of Spotify used ‘pirate’ MP3s have been floating around the Internet for years. People who had access to the service in the beginning later reported downloading tracks that contained ‘Scene’ labeling, tags, and formats, which are the tell-tale signs that content hadn’t been obtained officially.
techjamie
Crunchyroll was originally an anime piracy site that went legit and started actually licensing content later. They started in mid-2006, got VC funding in 2008, then made their first licensing deal in 2009.
https://www.forbes.com/2009/08/04/online-anime-video-technol...
https://venturebeat.com/business/crunchyroll-for-pirated-ani...
Cyph0n
Yep, they were huge too - virtually anyone who watched free anime back then would have known about them.
My theory is that once they saw how much traffic they were getting, they realized how big of a market (subbed/dubbed) anime was.
haiku2077
Good Old Games started out with the founders selling pirated games on disc at local markets.
Shank
And now Crunchyroll is owned by (through a lot of companies, like Aniplex of America, Aniplex, A1 Pictures) Sony, who produces a large amount of anime!
dathinab
not just Spotify pretty much any (most?) current tech giant was build by
- riding a wave of change
- not caring too much about legal constraints (or like they would say now "distrupting" the market, which very very often means doing illigal shit which beings them far more money then any penalties they will ever face from it)
- or caring about ethics too much
- and for recent years (starting with Amazone) a lot of technically illegal financing (technically undercutting competitors prices long term based on money from else where (e.g. investors) is unfair competitive advantage (theoretically) clearly not allowed by anti monopoly laws. And before you often still had other monopoly issues (e.g. see wintel)
So yes not systematic not complying with law to get unfair competitive advantage knowing that many of the laws are on the larger picture toothless when applied to huge companies is bread and butter work of US tech giants
benced
As you point out, they mostly did this before they were large companies (where the public choice questions are less problematic). Seems like the breaking of these laws was good for everybody.
pembrook
It wasn’t just the content being pirated, but the early Spotify UI was actually a 1:1 copy of Limewire.
pjc50
"recording obtained unofficially" and "doesn't have rights to the recording" are separate things. So they could well have got a license to stream a publisher's music but that didn't come with an actual copy of some/all of the music.
KoolKat23
There's plenty of startups gone legitimate.
Society underestimates the chasm that exists between an idea and raising sufficient capital to act on those ideas.
Plenty of people have ideas.
We only really see those that successfully cross it.
Small things EULA breaches, consumer licenses being used commercially for example.
hinterlands
The problem is that these "small things" are not necessarily small if you're an individual.
If you're an individual pirating software or media, then from the rights owners' perspective, the most rational thing to do is to make an example of you. It doesn't happen everyday, but it does happen and it can destroy lives.
If you're a corporation doing the same, the calculation is different. If you're small but growing, future revenues are worth more than the money that can be extracted out of you right now, so you might get a legal nastygram with an offer of a reasonable payment to bring you into compliance. And if you're already big enough to be scary, litigation might be just too expensive to the other side even if you answer the letter with "lol, get lost".
Even in the worst case - if Anthropic loses and the company is fined or even shuttered (unlikely) - the people who participated in it are not going to be personally liable and they've in all likelihood already profited immensely.
dathinab
but it's not some small things
but systematic wide spread big things and often many of them, giving US giant a unfair combative advantage
and don't think if you are a EU company you can do the same in the US, nop nop
but naturally the US insist that US companies can do that in the EU and complain every time a US company is fined for not complying for EU law
jowea
Uber
Barrin92
>Society underestimates the chasm that exists between an idea and raising sufficient capital to act on those ideas.
The AI sector, famously known for its inability to raise funding. Anthropic has in the last four years raised 17 billion dollars
pyman
There's no credible evidence Spotify built their company and business on pirated music.
This is a narrative that gets passed around in certain circles to justify stealing content.
YPPH
"Stealing" isn't an apt term here. Stealing a thing permanently deprives the owner of the thing. What you're describing is copyright infringement, not stealing.
In this context, stealing is often used as a pejorative term to make piracy sound worse than it is. Except for mass distribution, piracy is often regarded as a civil wrong, and not a crime.
lmm
> There's no credible evidence Spotify built their company and business on pirated music.
That's a statement carefully crafted to be impossible to disprove. Of course they shipped pirated music (I've seen the files). Of course anyone paying attention knew. Nothing in the music industry was "clean" in those days. But, sure, no credible evidence because any evidence anyone shows you you'll decide is not credible. It's not in anyone's interests to say anything and none of it matters.
Workaccount2
The common meme is that megacorps are shamelessly criminalistic organizations that get away with doing anything they can to maximize profits, while true in some regard, totally pales in comparison to the illegal things small businesses and start-ups do.
NoMoreNicksLeft
This isn't as meaningful as it sounds. Nintendo was apparently using scene roms for one of the official emulators on Wii (I think?). Spotify might have received legally-obtained mp3s from the record companies that were originally pulled from Napster or whatever, because the people who work for record companies are lazy hypocrites.
reaperducer
Apparently it's a common business practice.
It's not a common business practice. That's why it's considered newsworthy.
People on the internet have forgotten that the news doesn't report everyday, normal, common things, or it would be nothing but a listing of people mowing their lawns or applying for business loans. The reason something is in the news is because it is unusual or remarkable.
"I saw it online, so it must happen all the time" is a dopy lack of logic that infects society.
codedokode
> "Like any reader aspiring to be a writer, Anthropic's LLMs trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different," he wrote.
But this analogy seems wrong. First, LLM is not a human and cannot "learn" or "train" - only human can do it. And LLM developers are not aspiring to become writers and do not learn anything, they just want to profit by making software using copyrighted material. Also people do not read millions of books to become a writer.
pyman
These are the people shaping the future of AI? What happened to all the ethical values they love to preach about?
We've held China accountable for counterfeiting products for decades and regulated their exports. So why should Anthropic be allowed to export their products and services after engaging in the same illegal activity?
ffsm8
> We've held China accountable for counterfeiting products for decades and regulated their exports
We have? Are we from different multi-verses?
The one I've lived in to date has not done anything against Chinese counterfeits beyond occasionally seizing counterfeit goods during import. But that's merely occasionally enforcing local counterfeit law, a far cry from punishing the entity producing it.
As a matter of fact, the companies started outsourcing everything to China, making further IP theft and quasi-copies even easier
Workaccount2
I was gonna say, the enforcement is so weak that it's not even really worth it to pursue consumer hardware here in the US. Make product that is a hit, patent it, and still 1 month later IYTUOP will be selling an identical copy for 1/3rd the price on Amazon.
delfinom
Patent enforcement requires the patent holder to go after violators. The said thing is, there are grounds to sue Amazon facilitating it, just nobody has had the money to do it. And no big company ever will because of the threat of being locked out of AWS.
It's quite the mafia operation over at Amazon.
wmf
The unethical ones didn't buy any books.
benjiro
One rule for you, one rule for me ...
You never noticed the hypocrite behavior all over society?
* O, you drunk drive, big fine, lots of trouble. * O, you drunk drive and are a senator, cop, mayor, ... Well, lets look the other way.
* You have anger management issues and slam somebody to the ground. Jail time. * You as a cop have anger management issues and slams somebody to the ground. Well, paid time off while we investigate and maybe a reprimand. Qualified immunity boy!
* You tax fraud for 10k, felony record, maybe jail time. * You as a exec of a company do tax fraud for 100 million. After 10 years lawyering around, maybe you get something, maybe, ... o, here is a fine of 5 million.
I am sorry but the idea of everybody being equal under the law has always been a illusion.
We are holding China accountable for counterfeiting products because it hurts OUR companies, and their income. But when its "us vs us", well, then it becomes a bit more messy and in general, those with the biggest backing (as in $$$, economic value, and lawyers), tends to win.
Wait, if somebody steal my book, i can sue that person in court, and get a payout (lawyers will cost me more but that is not the point). If some AI company steals my book, well, the chance you win is close to 1%, simply because lots of well paid lawyers will make your winning hard to impossible.
Our society has always been based upon power, wealth and influence. The more you have of it, the more you get away (or reduced) with things, that gets other fined or jailed.
seydor
break things and move fast
carlosjobim
Why is it unethical of them to use the information in all these books? They are clearly not reselling the books in any way, shape, or form. The information itself in a book can never be copyrighted. You can also publish and sell material where you quote other books within it.
lofaszvanitt
This is the underlying caste system coming to life right before your eyes :D.
stephenitis
I think caste system is the wrong analogy here.
Comment is more about the pseudo ethical high ground
MangoToupe
Companies being above the law does create a stratified system in this country for those who can benefit from said companies and those who cannot. Call it what you like.
bmitc
Silicon Valley has always been the antithesis of ethics. It's foundations are much more right wing and libertarian, along the extremist lines.
DrillShopper
> So why should Anthropic be allowed to export their products and services after engaging in the same illegal activity?
Rules don't apply to corporations making money for VCs.
So it goes.
ramon156
Pirate and pay the fine is probably hell of a lot cheaper than individually buying all these books. I'm not saying this is justified, but what would you have done in their situation?
Sayi "they have the money" is not an argument. It's about the amount of effort that is needed to individually buy, scan, process millions of pages. If that's done for you, why re-do it all?
pyman
The problem with this thinking is that hundreds of thousands of teachers who spent years writing great, useful books and sharing knowledge and wisdom probably won't sue a billion dollar company for stealing their work. What they'll likely do is stop writing altogether.
I'm against Anthropic stealing teacher's work and discouraging them from ever writing again. Some teachers are already saying this (though probably not in California).
NoMoreNicksLeft
Stealing? In what way?
Training a generative model on a book is the mechanical equivalent of having a human read the book and learn from it. Is it stealing if a person reads the book and learns from it?
Der_Einzige
[flagged]
lofaszvanitt
They won't be needed anymore, once singularity is reached. This might be their thought process. This also exemplifies that the loathed caste system found in India is indeed in place in western societies.
There is no equality, and seemingly there are worker bees who can be exploited, and there are privileged ones, and of course there are the queens.
SketchySeaBeast
> They won't be needed anymore, once singularity is reached.
And it just so happens that that belief says they can burn whatever they want down because something in the future might happen that absolves them of those crimes.
pyman
:D
Note: My definition of singularity isn't the one they use in San Francisco. It's the moment founders who stole the life's work of thousands of teachers finally go to prison, and their datacentres get seized.
glimshe
That will be sad, although there will still be plenty of great people who will write books anyway.
When it comes to a lot of these teachers, I'll say, copyright work hand in hand with college and school course book mandates. I've seen plenty of teachers making crazy money off students' backs due to these mandates.
A lot of the content taught in undergrad and school hasn't changed in decades or even centuries. I think we have all the books we'll ever need in certain subjects already, but copyright keeps enriching people who write new versions of these.
CuriouslyC
If you care so little about writing that AI puts you off it, TBH you're probably not a great writer anyhow.
Writers that have an authentic human voice and help people think about things in a new way will be fine for a while yet.
4b11b4
Yeah, people will still want to write. They might need new ways to monetize it... that being said, even if people still want to write they may not consider it a viable path. Again, have to consider other monetization.
TimorousBestie
150K per work is the maximum fine for willful infringement (which this is).
105B+ is more than Anthropic is worth on paper.
Of course they’re not going to be charged to the fullest extent of the law, they’re not a teenager running Napster in the early 2000s.
voxic11
Even if they don't qualify for willful infringement damages (lets say they have a good faith belief their infringement was covered by fair use) the standard statutory damages for copyright infringement are $750-$30,000 per work.
glimshe
Isn't "pirating" a felony with jail time, though? That's what I remember from the FBI warning I had to see at the beginning of every DVD I bought (but not "pirated" ones).
voxic11
Yes criminal copyright infringement (willful copyright infringement done for commercial gain or at a large scale) is a felony.
pyman
Absolutely.
Pirating 7 million books, remixing their content, and using that to make money on Claude.ai is like counterfeiting 7 million branded products and selling them on your Shopify website. The original creators don't get payment, and someone's profiting off their work. Try doing that yourself and you'd get a knock on the door real quick.
dmix
A court just ruled on Anthropic and said an LLM response wasn't a form of counterfeiting (ie, essentially selling pirate books on the black market). Although tbf that is the most radical interpretation still being put forward by the lawyers of publishers like NYTimes, despite the obvious flaws.
mystified5016
No, it isn't.
suyjuris
Just downloading them is of course cheaper, but it is worth pointing out that, as the article states, they did also buy legitimate copies of millions of books. (This includes all the books involved in the lawsuit.) Based on the judgement itself, Anthropic appears to train only on the books legitimately acquired. Used books are quite cheap, after all, and can be bought in bulk.
asadotzler
Buying a book is not license to re-sell that content for your own profit. I can't buy a copy of your book, make a million Xeroxes of it and sell those. The license you get when you buy a book is for a single use, not a license to do what ever you want with the contents of that book.
suyjuris
Yes, of course! In this case, the judge identified three separate instances of copying: (1) downloading books without authorisation to add to their internal library, (2) scanning legitimately purchased books to add to their internal library, and (3) taking data from their internal library for the purposes of training LLMs. The purchasing part is only relevant for (2) — there the judge ruled that this is fair use. This makes a lot of sense to me, since no additional copies were created (they destroyed the physical books after scanning), so this is just a single use, as you say. The judge also ruled that (3) is fair use, but for a different reason. (They declined to decide whether (1) is fair use at this point, deferring to a later trial.)
thedevilslawyer
What are you on about - the judge has literally said this was not resell, and is transformative and fair use.
darkoob12
This is not about paying for a single copy. It would still be wrong even if they have bought every single one of those books. It is a form of plagiarism. The model will use someone else's idea without proper attribution.
jeroenhd
Legally speaking, we don't know that yet. Early signs are pointing at judges allowing this kind of crap because it's almost impossible for most authors to point out what part of the generated slop was originally theirs.
blibble
> Pirate and pay the fine is probably hell of a lot cheaper than individually buying all these books.
$500,000 per infringement...
maeln
If you wanted to be legit with 0 chance of going to court, you would contact publisher and ask to pay a license to get access to their catalog for training, and negotiate from that point.
This is what every company using media are doing (think Spotify, Netflix, but also journal, ad agency, ...). I don't know why people in HN are giving a pass to AI company for this kind of behavior.
pyman
100%
It's the new narrative in certain circles, especially in San Francisco: it's us vs China. We're doing all this to beat them, no matter the cost. While teachers are left scratching their heads with four kids to feed.
edgineer
The paradigm is that teachers will teach life skills like public speaking and entrepreneurship. Book smarts that can be more effectively taught by AI will be, once schools catch up.
ohashi
Because they are mostly software developers who think it's different because it impacts them.
tmaly
At minimum they should have to buy the book they are deriving weights from.
SirMaster
But should the purchase be like a personal license? Or like a commercia license that costs way more?
Because for example if you buy a movie on disc, that's a personal license and you can watch it yourself at home. But you can't like play it at a large public venue that sell tickets to watch it. You need a different and more expensive license to make money off the usage of the content in a larger capacity like that.
pyman
Anthropic's cofounder, Ben Mann, downloaded million copies of books from Library Genesis in 2021, fully aware that the material was pirated.
Stealing is stealing. Let's stop with the double standards.
originalvichy
At least most pirates just consume for personal use. Profiting from piracy is a whole other level beyond just pirating a book.
pyman
Someone on Twitter said: "Oh well, P2P mp3 downloads, although illegal, made contributions to the music industry"
That's not what's happening here. People weren't downloading music illegally and reselling it on Claude.ai. And while P2P networks led to some great tech, there's no solid proof they actually improved the music industry.
Imustaskforhelp
I really feel as if Youtube is the best sort of convenience for music videos where most people watch ads whereas some people can use an ad blocker.
I use an adblocker and tbh I think so many people on HN are okay with ad blocking and not piracy when basically both just block the end user from earning money.
I kind of believe that if you really like a software, you really like something. Just ask them what their favourite charity is and donate their or join their patreon/a direct way to support them.
drcursor
Let's not forget Spotify ;)
https://gizmodo.com/early-spotify-was-built-on-pirated-mp3-f...
mnky9800n
I feel like profit was always a central motive of pirates. At least from the historical documents known as, "The Pirates of the Caribbean".
KoolKat23
This isn't really profiting from piracy. They don't make money off the raw input data. It's no different to consuming for personal use.
They make money off the model weights, which is fair use (as confirmed by recent case law).
j_w
This is absurd. Remove all of the content from the training data that was pirated and what is the quality of the end product now?
mrcwinn
> At least most pirates just consume for personal use.
Easy for the pirate to say. Artists might argue their intent was to trade compensation for one's personal enjoyment of the work.
Workaccount2
The gut punch of being a photographer selling your work on display, someone walks by and lines up their phone to take a perfect picture of your photograph, and then exclaims to you "Your work is beautiful! I can't wait to print this out and put it on my wall!"
jobs_throwaway
All the evidence shows that piracy is good for artists' business. You make a good work, people are exposed to it through piracy, and they end up buying more of your stuff than they would otherwise. But keep crying about the artist's plight
dathinab
stealing with the intent to gain a unfair marked advantage so that you can effectively kill any ethically legally correctly acting company in a way which is very likely going to hurt many authors through the products you create is far worse then just stealing for personal use
that isn't "just" stealing, it's organized crime
1970-01-01
Let's get actual definitions of 'theft' before we leap into double standards.
NoMoreNicksLeft
>Stealing is stealing.
Yes, but copying isn't stealing, because the person you "take" from still has their copy.
If you're allowed to call copying stealing, then I should be allowed to call hysterical copyright rabblerousing rape. Quit being a rapist, pyman.
x3n0ph3n3
Copyright infringement is not stealing.
impossiblefork
It's very similar to theft of service.
There's so many texts, and they're so sparse that if I could copyright a work and never publish it, the restriction would be irrelevant. The probability that you would accidentally come upon something close enough that copyright was relevant is almost infinitesimal.
Because of this copyright is an incredibly weak restriction, and that it is as weak as it is shows clearly that any use of a copyrighted work is due to the convenience that it is available.
That is, it's about making use of the work somebody else has done, not about that restricting you somehow.
Therefore copyright is much more legitimate than ordinary property. Ordinary property, especially ownership of land, can actually limit other people. But since copyright is so sparse infringing on it is like going to world with near-infinite space and picking the precise place where somebody has planted a field and deciding to harvest from that particular field.
Consequently I think copyright infringement might actually be worse than stealing.
jpalawaga
you've created a very obvious category mistake in your final summary by confusing intellectual property--which can be copied at no penalty to an owner (except nebulous 'alternate universe' theories)--with actual property, and a farmer and his land, with a crop that cannot be enjoyed twice.
you're saying copying a book is worse than robbing a farmer of his food and/or livelihood, which cannot be replaced to duplicated. Meanwhile, someone who copies a book does not deprive the author of selling the book again (or a tasty proceedings from harvest).
I can't say I agree, for obvious reasons.
pyman
Pirating a book and selling it on claude.ai is stealing, both legally and morally.
BlackFly
Making a copy differs from taking an existing object in all aspects: literally, technically, legally and ethically. Piracy is making a copy you have no legal right to. Stealing is taking a physical object that you have no legal right to. While the "no legal right to" seems the same superficially, in practice the laws differ quite a bit because the literal, technical and ethical aspects differ.
thedevilslawyer
Where can I download Harry Potter on claude.ai pls?
TiredOfLife
They are not selling it on claude.ai. If you can prove that they are you will be rich.
zb3
Who got robbed? Just because I'd pay for AI it doesn't mean I'd buy these books.
seydor
property infringement isn't either?
eviks
If you infringe by destroying property, then yes, it's not stealing
1oooqooq
actually, the Only time it's a (ethical) crime is when a corporation does it at scale for profit.
damnesian
oh well, the product has a cute name and will make someone a billionaire, let's just give it the green light. who cares about copyright in the age of AI?
hellohihello135
It’s easy to point fingers at others. Meanwhile the top comment in this thread links to stolen content from Business Insider.
https://archive.md/YLyPg