Skip to content(if available)orjump to list(if available)

Copyright winter is coming (to Wikipedia?)

areoform

The push to expand repressive copyright laws because machines can learn from human produced text, code and art is going to hurt us all in the long run.

People usually say contemporary media sucks because of commercial pressures, but those commercial pressures and conditions wouldn't exist without the expansion of copyright.

Yes, giant studios are struggling to introduce new ideas like 1993's Jurassic Park. But that doesn't mean Shane Carruth (of Primer fame) can't. And he could have if Jurassic Park had been released any time between 1790 and 1900.

Our stilted media landscape is directly downstream of prior legislation expanding copyright.

Expanding copyright even more so that text / art that looks stylistically similar to another work is counted as infringing will, in the long run, give Disney's lawyers the power to punish folks for making content that even looks anything like Disney's many, many, many IP assets.

Even though Steamboat Willie has entered the public domain, Disney has been going after folks using the IP, https://mickeyblog.com/2025/07/17/disney-is-suing-a-hong-kon... / https://mickeyblog.com/2025/07/17/disney-is-suing-a-hong-kon...

The "infringement" in this case was a diamond encrusted Steamboat Willie style Mickey pendant.

Questionable taste aside, I think it's good for society if people are able to make diamond encrusted miniature sculptures of characters from a 1928 movie in 2025. But Disney clearly disagrees.

Disney (and other giant corps) will use every tool in their belt to go after anyone who comes close to their money makers. There has been a long history of tension between artists and media corps. But that's water under the bridge now. AI art is apparently so bad that artists are willing to hand them the keys to their castle.

visarga

> Expanding copyright even more so that text / art that looks stylistically similar to another work is counted as infringing will, in the long run, give Disney's lawyers the power to punish folks for making content that even looks anything like Disney's many, many, many IP assets.

Legal doctrines like the "Abstraction-Filtration-Comparison test", "total concept and feel," "comprehensive non-literal similarity," and "sequence, structure and organization" have systematically ascended the abstraction ladder. Copyright no longer protects expression but abstractions and styles.

The ugly part is the asymmetry at play - a copyright holder can pick and choose the level of abstraction on which to claim infringement, while a new author cannot possibly avoid all similarities on all levels of abstraction for all past works. The accuser can pick and choose how to frame infringement, the accused has to defend from all possible directions.

tavavex

> Expanding copyright even more so that text / art that looks stylistically similar to another work is counted as infringing will, in the long run, give Disney's lawyers the power to punish folks for making content that even looks anything like Disney's many, many, many IP assets.

This made me wonder about an alternate future timeline where IP law is eventually so broad and media megacorporations are so large that almost any permutation of ideas, concepts or characters could be claimed by one of these companies as theirs, based on some combination of stylistic similarities and using a concept similar to what they have in their endless stash of IP. I wonder what a world like that would look like. Would all expression be suppressed and reduced to the non-law-abiding fringes and the few remaining exceptions? Would the media companies mercifully carve out a thin slice of non-offensive, corporate-friendly, narrow ideas that could be used by anyone, putting them in control of how we express ourselves? Or would IP violation become so common that paying an "IP tax" be completely streamlined and normalized?

The worst thing is that none of this seems like the insane ramblings that it would've probably been several decades ago. Considering the incentives of companies like Disney, IP lawyers and pro-copyright lawmakers, this could be a future we get to after a long while.

thisislife2

Corporates can't have it both ways - the Hollywood corporates lobbied intensively to extend copyright for as long as 75+ years (if I recall right) because that's what would benefit them. Many have protested about this. Some tech corporates (namely search and AI companies) now feel encumbered by this, and even indulge in piracy to circumvent copyright (without any meaningful consequences), and we are now supposed to feel sorry for them? Are any of these Tech corporates also lobbying for changes to copyright laws? (I don't believe so, as many of them are now also trying to become media moghuls themselves!)

satvikpendem

> The push to expand repressive copyright laws because machines can learn from human produced text, code and art is going to hurt us all in the long run.

Exactly. I always thought it was hilarious that, ever since LLMs and image generators like Stable Diffusion came online a few years ago, HN suddenly seemed to shift from the hacker ethos, of moving fast and breaking things, and using whatever you could for your goals, to one of being an intense copyright hawk, all because computers could now "learn."

CamperBob2

It's a moot point, at least as far as AI is concerned, because nobody in China gives a mouse's behind about any of this.

Nor should they.

visarga

While Chinese models train on all Western cultural output, our own models are restricted. And in the corporate world the models of choice for finetuning are DeepSeek and Qwen even in the West, not just in China.

petermcneeley

The implication here of course that if we allow AI to be taken down by copyright then it could also take down Wikipedia. I am not even sure this is close to being true despite the article trying to suggest otherwise.

Perhaps a section on what the differences are might be helpful. For example what role does style play in the summary. I dont think that the summary of wiki is in the style of George R Martin.

tavavex

I'm confused. There's an entire paragraph in the article where the author compares the two summaries and finds that they differ only in their structuring. I can't find any part of the article saying that the LLM summary was written "in the style of George R.R. Martin", as far as I understand both summaries are conceptually very similar. That's the main problem. If the scope of substantial similarity to a novel is pushed down from hundreds of pages of writing to a summary that's a couple paragraphs long, then all these summaries are in potential danger. To my knowledge there's no criteria that lets you only find LLM summaries infringing without leaving an opening for the lawyers to expand the reach to target all summaries of copyrighted content.

petermcneeley

Even if true wiki would escape via fair use and AI would not. It is possible that the laws and judgements are inconsistent nonsense but assuming they are not the fact that wiki has been around for decades suggests at least one key difference.

tavavex

Just because Wikipedia has persisted for 20+ years doesn't mean that a key decision later down the line can't make it into an open season for all IP owners. AI-related lawsuits are a great opportunity for copyright owners to greatly shake up the status quo under the (fairly legitimate) guise of protecting themselves from LLM copying. Even if Wikipedia in particular could skirt it through fair use, the fact that hundred-word long summaries would be found "similar" to full novels would represent a large encroachment of copyright that would allow many other lawsuits to open up with entities who may not be as lucky as Wikipedia. Changing the answer to "Is something as brief as this notably similar to a full work?" from "what? Of course not" to "well... do you have a fair use reason?" would mean that many people will need to start looking both ways and triple-checking whatever they create/summarize/report on as to avoid tipping off anyone hungry for some settlement money.

varenc

The ruling never said summaries are infringing. It just said the authors’ claims about some AI outputs were "plausible" enough to get past a motion to dismiss, which is basically the lowest hurdle. The judge isn’t deciding what actually counts as infringement, just that the case can move forward. IMHO the title of the article is reading more into the opinion than what the judge actually decided.

tavavex

The author already fully addressed this in the article. They just think that even the fact that this was allowed to move forward is a worrying sign:

> Judge Stein’s order doesn’t resolve the authors’ claims, not by a long shot. And he was careful to point out that he was only considering the plausibility of the infringement allegation and not any potential fair use defenses. Nonetheless, I think this is a troubling decision that sets the bar on substantial similarity far too low.

chupchap

From what I understood, the case against OpenAI wasn't about the summarisation. It was the fact that the AI was trained on copyrighted work. In case of Wikipedia, the assumption is that someone purchased the book, read it, and then summarised it.

null

[deleted]

colechristensen

There are separate issues.

One is a large volume of pirated content used to train models.

Another is models reproducing copyrighted materials when given prompts.

In other words there's the input issue and the output issue and those two issues are separate.

cameldrv

They’re sort of separate. In a sense you could say that the ChatGPT model is a lossily compressed version of its training corpus. We acknowledge that a jpeg of a copyrighted image is a violation. If the model can recite Harry Potter word for word, even imperfectly, this is evidence that the model itself is an encoding of the book (among other things).

You hear people saying that a trained model can’t be a violation because humans can recite poetry, etc, but a transformer model is not human, and very philosophically and economically importantly, human brains can’t be copied and scaled.

tavavex

They're very separate in terms of what seems to have happened in this case. This lawsuit isn't about memory or LLMs being archival/compression software (imho, a very far reach) or anything like that. The plaintiffs took a bit of text that was generated by ChatGPT and accused OpenAI of violating their IP rights, using the output as proof. As far as I understand, the method at which ChatGPT arrived to the output or how Game of Thrones is "stored" within it is irrelevant, the authors allege that the output text itself is infringing regardless of circumstance and therefore OpenAI should pay up. If it's eventually found that the short summary is indeed infringing on the copyright of the full work, there is absolutely nothing preventing the authors (or someone else who could later refer to this case) from suing someone else who wrote a similar summary, with or without the use of AI.

duskwuff

> You hear people saying that a trained model can’t be a violation because humans can recite poetry, etc

Also worth noting that, if a person performs a copyrighted work from memory - like a poem, a play, or a piece of music - that's a copyright violation. "I didn't copy it, I memorized it" isn't the get-out-of-jail-free card some people think it is.

yorwba

A jpeg of a copyrighted image can be copyright infringement, but isn't necessarily. A trained model can be copyright infringement, but isn't necessarily. A human reciting poetry can be copyright infringement, but isn't necessarily.

The means of reproduction are immaterial; what matters is whether a specific use is permitted or not. That a reproduction of a work is found to be infringing in one context doesn't mean it is always infringing in all contexts; conversely, that a reproduction is considered fair use doesn't mean all uses of that reproduction will be considered fair.

throwaway-0001

I think we have no evidence someone bought the book and summarized. And what if an ai bought the book and summarized, is it fine now?

TheDong

To me the key difference is that Wikipedia summaries are written by a human, and so creativity imbues them with new copyright.

OpenAI outputs are an algorithm compressing text.

A jpeg thumbnail of an image is smaller but copyright-wise identical.

An OpenAI summary is a mechanically generated smaller version, so new creative copyright does not have a chance to enter in

jjcm

The issue becomes there's little to no way to tell the difference between the two.

Additionally, if human summaries aren't copyright infringement, you can train LLMs on things such as the Wikipedia summaries. In this situation, they're still able to output "mechanical" summaries - are those legal?

netule

To add to your points, Wikipedia also generally cites its sources, whereas LLMs do not. I believe this is a significant distinction.

throwaway290

This.

Also there is fair use gray area. Unlike Wikipedia, ClosedAI is for profit to make money from this stuff and people using generated text do it for profit.

Robotbeat

So if OpenAI stayed a non-profit, they'd be okay?

chris_wot

This is going to make anyone who does a college assignment explaining the general plot of a novel liable to copyright infringement. That’s absurd.

Wikipedia is careful to cite their sources. Is OpenAI as careful?

novemp

That image caption says "A white walker in a desolate field reading Wikipedia", but the (backwards for some reason) Wikipedia article says "White Waleers". Forgive me for thinking this person might not have the necessary braincells to commentate on legal issues.

wzdd

Entertaining that the article about copyright-infringing similarity of AI-generated summaries is illustrated with a picture of an animated skeleton labelled "White Walker", which is neither what White Walkers are nor what they look like.

bawolff

Honestly, i always thought this was how it always worked. A summary is by neccesisty a derrivative of the thing being summarized, but it is also very vert clearly fair use. Its transformational, its for an educational purpose, it contains only a tiny portion of the original work and it does not compete with the original work. I can't imagine anything more fair use then that.

Personally i'm not worried.

cm2012

This is my favorite article on HN since the one on solar panels in Africa. Love to see a subject matter expert making a case at the bleeding edge of their field.

CuriouslyC

The high seas are going to be crowded soon.

null

[deleted]