Skip to content(if available)orjump to list(if available)

Copyright Office suggests AI copyright debate was settled in 1965

lolinder

The headline is either sloppy or intentionally misleading: the Copyright Office is saying that the law surrounding whether AI generated works can be copyrighted was settled in 1965 (the answer being "yes if AI assisted a human creative process, no if not, and we have to decide on a case by case basis if there was enough human input to qualify"). This has been their stance all along, but now they've provided a bit more guidance on what counts as human input, which is helpful.

What this article doesn't talk about at all is the far more controversial AI copyright debate, the one most people will think of given the headline: whether training a model is fair use. That's the one everyone is actually concerned about, and they're definitely not claiming it was settled already.

Salgat

The human input makes sense, otherwise, couldn't you bruteforce generate billions of low resolution images that cover a vast range of situations and then use that to attack anything similar enough to fit the substantial similarity condition? You could even plug a news feed into the generator.

dotancohen

Somebody did this with music - they brute forced all chord progressions or something like that. In theory all new music is infringing.

somenameforme

Things like this often makes me wish we had more 'common sense' laws and left the discretion of interpreting that notion to judges, juries, and the various systems of appeals and other courts we have, entirely with the expectation that laws would 'evolve' over time. This might sound radical, but instead it's actually just going back to how things used to be. Here is the First Amendment in its entirety:

---

"Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof; or abridging the freedom of speech, or of the press; or the right of the people peaceably to assemble, and to petition the Government for a redress of grievances."

---

The rest of the bill of rights looks similarly. Now a days it'd be thousands of pages long trying to elaborate endlessly upon every single scenario. But the more important thing is that this idea of trying to codify every scenario still doesn't work because you end up with a zillion loopholes in just about every single law with some clever clown going 'ah hah! you didn't cover this!' So all you really do is end up with laws that are not only excessively fragile and subject to exploitation, but also completely indecipherable by just about anybody, certainly including the people voting on their passage.

AstralStorm

Unfortunate problem of copyright being an ever uphill battle and why it should be short timed.

Similarly with patents.

Even when used there should be a timeout, possibly per clause to avoid overly broad stuff.

But then a few IP trolls and lawyers would have to find another job.

Terr_

> whether training a model is fair use

I want to highlight that training the model is only one part of the copyright questions going on, the other is how they are making and keeping direct long-term copies in giant training datasets.

Imagine what would happen if a regular person bought and then immediately resold thousands of books, CDs, and movies, taking just enough time to make a copy of each one and building out their own library/movie-theater for friends and coworkers. You think the powers-that-be would let you or I get away with that?

There is no (non-evil) reason to hold multijillion dollar corporations with professional legal advice on-tap to a lower standard than regular people.

xp84

You’re making the exact same kind of maximalist argument as people who argued that ripping a CD and letting your brother listen to the MP3s is exactly morally and criminally like shoplifting the disc. Or that recording a movie off TV was equal to stealing it. (That particular one was of course famously judged by the courts as “tough shit” to the copyright owners who sued over it.)

Yes, training does impart some fraction of an article into the weights. NYT famously “demonstrated” this by like typing whole paragraphs of their articles into GPT and having the model produce some of the following sentences. However, this substitutes for the article in zero ways, since if you need the article to summon the article…who cares?

We should admit that nothing about our copyright law intentionally weighed in about LLMs. It’s simply nuts to apply a law to a situation when its drafters had no idea of the positive or negative implications of such an application. It would be like applying a law that calculated prison time based on number of horses stolen to someone who stole a Honda Civic with 100hp and saying that clearly they should get 100 years because it is equal to 100 horses.

Now, I get that we have a useless legislative branch which even if they actually passed intentionally applicable laws, they’d be stupid ones, but I think making simplistic analogies like that do not help anyone, other than the Luddites. Look, the cat is out of the bag and even if say, the US government effectively killed all Gen AI by forcing any training to be done with material you own the copyright of, countries like China (and criminals, who can simply use the tech in secret) will happily just pull ahead of us and economically demolish us - like any country would have done if a competing country had banned electricity 100 years ago.

We need something better than the ridiculously unfit 200-year old paradigm for this.

bonzini

> yes if AI assisted a human creative process, no if not

Fair enough but does that help settle the other question, which is whether weights are considered derivative works of any material used in the training?

sublinear

ALL OF WHAT AI HAS BEEN TRAINED ON IS HUMAN INPUT

cxr

There's not really much of a debate, just a bunch of clamoring and wishful thinking by rightsholders who don't understand copyright law insisting that precedent should be subordinate to mimetic outrage over LLMs.

throwaway17_17

In what way are ‘rightsholders’ expressing wishful thinking? I assume you are saying that there is no violation of those rights controlling various properties that have been used to train ‘AI’. You then mention precedent in a way that implies there are legal decisions that make it clear ‘AI’ training using copyrighted material does not violate the rights of those who own that material. Could you list or link to such a precedent?

To the best of my knowledge, there is no direct precedent from any federal circuit addressing this issue and certainly no USSC opinions dealing with the issue. Additionally, any analogies drawn from precedent focused on other areas of intellectual property law is easily distinguishable. This is truly fresh legal ground and the next 10 years of jurisprudence will go a long way towards building the precedent that your comment would imply to already exist.

Just to be explicit, the above, while a legal opinion, IS NOT legal advice.

cxr

No amount of solidarity from support groups comprising clusters of likeminded folks on internet message boards who're opposed to settled law is a substitute for an act of Congress, which is what it will take to give the position of folks opposed to contemporary GenAI any legs.

Neither your comments to HN nor anyone else's strenuous assertions that there's anything to debate are going to change anything.

If you want to treat LLMs as a special case—which is what you want, since there is an entire history of jurisprudence that you have to contend with here—then you need to get Congress to write legislation that says so.

jpalawaga

Copyright law stipulates the conditions in which content can be reproduced, not conditions in which it can be consumed.

Arguably the material has been learned and not copied. Maybe in some cases learned with an uncanny ability or photographic memory, but learned. (People with photographic memories also cannot reproduce content in an unlimited fashion).

Aloisius

> "Where a human inputs their own copyrightable work and that work is perceptible in the output, they will be the author of at least that portion of the output," the guidelines said.

This policy is sensible. Most AI generated works should be uncopyrightable, except where a substantial human contribution is in the output.

Simply describing a picture and letting AI generate it shouldn't be enough for the same reason that dictating what you want to a painter isn't enough to earn you copyright over the resulting painting.

I would be wary about integrating too much AI output into works one wants to enforce copyright over without some level of documentation. The nightmare scenario is having your copyright stripped away because of evidence one used AI extensively.

NitpickLawyer

> Simply describing a picture and letting AI generate it shouldn't be enough

Interesting take, and I've heard this many times. I'm curious to explore this further and see why you think that is, and where do you draw the line?

Is it the "low effort"? Is it the "automated" stuff? Is the process of setting it up, prompting it and choosing a result not enough "creative input"? If so, why?

Let's take a "real world" example as analogue. Say I setup a camera on a tripod. I set it to take pictures every 1 second, and leave it there. Come back 1hr later, and go through the pictures. I select one of the sunset I like, and post it. Would I not have copyright on that picture? I wasn't there when it was taken. But I did setup the camera and selected the end result. How is that different?

Taking it back to genAI, say I build/train/finetune my own model. Would it now have enough "effort" from me that I can use those generations? Is this an effort thing or is it more? Or is it just that someone else did the work?

What about random "art"? As in art based on random numbers. Say I write a script in python to use random math formulas to "draw" on a canvas. I let it run for a couple of hours, come back, look at the results and select one. Do I not get copyright on the resulting "art" because it was randomly generated by a script? Does it matter if the script was written by me? Would it be different if you download my script and generate the art yourself? Would you not have copyright?

I guess what I'm trying to say is "where do we draw the line?". It's not clear to me why people say "simply prompting and selecting isn't creative enough". This distinction wasn't there before. Plenty of "art" out there based on random processes + curation. Why the sudden change?

njarboe

If the painter is doing a "work for hire" you should get the copyright.

Aloisius

They can if they buy the copyright from the painter.

They just can't get it from the government because they are not, in any sense, the author of the creative work.

galaxyLogic

Right, you cannot copyright such output, is now clear(er).

But what about the other direction, can distributing such AI generated content VIOLATE somebody else's copyright?

If output of AI cannot be copyrighted, can it violate copyright?

ilaksh

It says they were not able to reproduce an image with the same prompt. So they just didn't know about seeds?

BeefySwain

Why is a binary (compiled machine code) protected by copyright, but the raw output of an AI model is not?

null

[deleted]

andsoitis

Courts have ruled that compilation does not remove originality—the binary is still a transformation of an original, copyrighted work (the source code).

realusername

Because binaries are a transformation of the source code, which is written by a human.

Other kind of binaries which are fully generated by a machine like private keys aren't copyrightable.

Animats

US copyright applications are not examined, in the sense that patents are. Issued patents are presumptively valid. Registered copyrights are not. Whether a copyright application is valid has to be determined by a court.

sublinear

I'm pretty confident the copyright office was massively overthinking it in 1965 and knocked it out of the park far beyond the watered down and ignorant arguments we hear today. It's sad really.

philippta

I think the main two questions everyone need clarified are:

1. Can I get sued by a 3rd party when using AI generated work in my project?

2. Can I sue a 3rd party when they use my AI generated work in their project?

jarsin

When uploading books to kindle direct publishing you have to state that you own the copyright and publishing right.

So any book or story on Amazon that was generated substantially via prompting should now have to be removed based on this guidance from the copyright office.

furyofantares

You can publish public domain content on kindle.

https://kdp.amazon.com/en_US/help/topic/G200743940

Aloisius

Yeah, though Amazon could just make their own copy available without compensating the uploader.

cyberax

That's incorrect. Purely factual books (like phone dictionaries or map atlases) are perfectly fine for publishing.

feoren

Purely factual books are copyrightable. It is the collection and curation of those facts that is protected. You cannot just copy someone else's 100 Amazing Facts about The Rainforest verbatim; if you publish 100 Cool Truths about The Jungle and it has those same 100 facts, you'll get sued and they'll easily win.

jcranmer

The EU and the UK generally has something akin to "sweat of the brow", where collections of facts that took time to collate are copyrightable.

But in the US, Feist v Rural explicitly disavowed the sweat of the brow doctrine, and said that facts have no copyright value--a work requires a quantum of original creative spark to be copyrightable (it was discussed in the context of phone books--a phone book does still have some residual "thin copyright", but the listing of phone numbers is not copyrightable, and it is actually difficult to infringe on the thin copyright of a phone book). In the US, your example would easily be found to be not infringing, if the only similarity were reproducing the same 100 facts.

futybt

[flagged]

dboreham

Haiku?

drewcoo

Burma Shave