OpenAI's Motion to Dismiss Copyright Claims Rejected by Judge
14 comments
·April 5, 2025Animats
tpmoney
It's funny you mention Wikipedia, I wonder if (at least since the early days when Wikipedia was the big scary thing on the block) anyone has run the same sorts of searches for plagiarized material and "hallucinations" against Wikipedia. After all, Wikipedia explicitly forbids "original" research, which means all of the output of Wikipedia is by definition a regurgitation of someone else's work. Yes you're supposed to cite everything, but between the number of things that are [Citation Needed] and the number of cites that don't seem to actually go to anything, there's almost certainly a good amount of "hallucinations" in there too (see also the effectively the entire Scots Wikipedia https://www.theregister.com/2020/08/26/scots_wikipedia_fake/). And that doesn't get into whether the factual things that are cited gave permission to the editors to use their material in the first place. Of course, I would argue Wikipedia is sufficiently transformative (and facts aren't subject to copyright anyway) and is overall a net good despite its problems. But I also argue the same of the various LLMs and their outputs too.
the_arun
If openAI paid Wikipedia & others $x before charging their customers $20 or $200 per month, it would have been better. Having said that how OpenAI’s case is different from what Google is doing with their crawling & making money on Ads?
paxys
Google has already been through countless lawsuits because of its indexing, and what we see now is the sum total of all the wins/losses/settlements constantly happening in every jurisdiction in the world. The number of words they can show in the snippets, how they must respond to takedown requests, how they must share revenue for content, what they can and cannot cache, how personal info is removed upon request, how someone's house is blurred out, what content fee they must pay news publishers...everything is regulated. So "Google does it" is not at all an excuse that OpenAI or anyone else can use.
ratorx
Presumably the difference is index (listing links) vs reproduction (actually returning content).
Also, it’s easier to remove copyright material if it’s not all crammed into an LLM first. Eg. If someone wanted to remove their website from Google, you can do that incrementally without rebuilding the entire index, whereas it’s a lot harder post-LLM (post-processing is probabilistic at best).
madeofpalk
Exchange of value.
Websites tend to be okay with it because they accrue a benefit of Google’s crawling - they get traffic back. When websites don’t feel that Google keeps the traffic for themselves, websites tend to get upset https://www.theregister.com/2020/03/11/yelp_congress_google/
LLM training just takes and keeps all benefit to themselves. Wikipedia (or news site) get no traffic or anything back in return.
jraph
Not quite good enough I think, the copyright holders are not Wikipedia, but the individual contributors.
ramshanker
Good. It may finally lead to Legislative action reducing the 95 year absurdity.
qingcharles
Pretty much a nothingburger. Standard Motion to Dismiss territory -- just carving some edges off the claims, and a few Hail Mary attempts to dismiss (e.g. OpenAI saying the lawsuit was time-barred).
smeeger
RIP suchir balaji
null
NoWordsKotoba
As it should be. This nonsense about tech committing a crime big enough that it can't be adjudicated is (along with so much other stuff) creating social upheaval. These are not good disruptions.
flashgordon
You are saying as if it is obvious but given all that is going on my first reaction after reading this (the post i mean) - ok so what loop holes they are going to use and what back room dealings went on (or are going on).
NoWordsKotoba
[dead]
The comprehensive sources of good content, such as Wikipedia, and major news outlets, do seem to go into LLMs and come out the other side.