Skip to content(if available)orjump to list(if available)

OpenAI Fails to Deliver Opt-Out System for Photographers

toddmorey

No way OpenAI will ever “good citizen” this. Tools to opt out of training sets will only come if they are legally compelled. Governments will have to make respecting some sort of training preference header on public content mandatory I think.

The fact that photographers have to independently submit each piece of work they wanted excluded along with detailed descriptions just shows how much they DONT want anyone excluding content from their training data.

dylan604

> The fact that photographers have to independently submit each piece of work they wanted excluded along with detailed descriptions just shows how much they DONT want anyone excluding content from their training data.

That's bloody brilliant. If you don't want us to scrape your content, please send us your content with all of the training data already provided so we will know not to scrape it if we come across it in the wild. FFS

echelon

Insofar as data for diffusion / image / video models are concerned, the rise of synthetic data and data efficiency will mean that none of this really matters anyway. We were just in the bootstrapping phase.

You can bolt on new functional modules and train them with very limited data you acquire from Unreal Engine or in the field.

toddmorey

I don’t entirely agree. For example, it’s a very popular scheme on Etsy right now to use LLMs to generate posters in the style of popular artists. Any artist should be able to say hey I don’t want my works to be part of your training set to power derivative generations.

And I think it should even apply retroactively so that they have to retrain their models that are already generating works from training data consumed without permission. Of course, OpenAI would fight that tooth & nail but they put themselves in this position with a clear “take first ask permission later” mentality.

llm_trw

Should any artist be able to tell another artist: hey don't copy my work when you're learning, I don't want competition?

It seems like they are deeply upset someone has figured out a way for a machine to do what artists have been doing since time immemorial.

simonw

Has synthetic data become a big part of image/video models?

I understand why it's useful and popular for training LLMs, but I didn't think it was applicable to generative image/video work.

llm_trw

I haven't had the chance to train diffusion models but for detection models synthetic data is absolutely how you get state of the art performance now. You just need a relatively tiny extremely high quality dataset to bootstrap from.

null

[deleted]

toddmorey

For clarity, I do agree that synthetic data is huge for training AI to do certain tasks or skills. But I don’t think creative work generation is powered by synthetic data and may not be for a quite while.

oraphalous

I don't even understand why it's everyone elses problem to opt-out.

Eventually - for how many of these AI companies would a person have to track down their opt-out processes just to protect their work from AI? That's crazy.

OpenAI should be contacting every single one and asking for permission - like everyone has to in order to use a person's work. How they are getting away with this is beyond me.

griomnib

Napster had a moment too, but then they got steamrolled in court.

Courts are slow, so it seems like nothing is happening, but there’s tons of cases in the pipeline.

The media industry has forced many tech firms to bend the knee, OpenAI will follow suit. Nobody rips off Disney IP and lives to tell the tale.

llm_trw

And yet Micky Mouse is in the public domain. Something those of us who remember the 90s thought would never happen.

CamperBob2

I don't even understand why it's everyone elses problem to opt-out.

Because the work being done, from the point of view of people who believe they are on the verge of creating AGI, is arguably more important than copyright.

Less controversially: if the courts determine that training an ML model is not fair use, then anyone who respects copyright law will end up with an uncompetitive model. As will anyone operating in a country where the laws force them to do so. So don't expect the large players to walk away without putting up a massive fight.

SketchySeaBeast

Of note here is the reason it's "important" is it will make a shit-ton of money.

CamperBob2

That, coupled with the obvious ideological motivations. Success could alter the course of human history, maybe even for the better.

If you feel that what you're doing is that important, you're not going to let copyright law get in the way, and it would be silly to expect you to.

paulcole

> OpenAI should be contacting every single one and asking for permission - like everyone has to in order to use a person's work

This is the problem of thinking that everyone “has” to do something.

I assure you that I (and you) can use someone else’s work without asking for permission.

Will there be consequences? Perhaps.

Is the risk of the consequences enough to get me to ask for permission? Perhaps.

Am I a nice enough guy to feel like I should do the right thing and ask for permission? Perhaps.

Is everyone like me? No.

> How they are getting away with this is beyond me.

Is it really beyond you?

I think it’s pretty clear.

They’re powerful enough that the political will to hold them accountable is nonexistent.

griomnib

I think it’s safe to assume anything Sam A says is an outright lie by now.

hnburnsy

Maybe the task to implement it was scheduled by ChatGPT...

https://news.ycombinator.com/item?id=42716744

Bilal_io

Sorry the task failed for unknown reasons.

testfrequency

Probably means nothing, but all the people I know who went to OpenAI and are still there are all the people who made very poor business decisions and were hated at multiple companies I worked for.

High doubt any of them will be good stewards of anything but selfishness.

As for the others, they were all smart, passionate, dedicated folks who knew Sam was a complete narcissist and left to start their own AI startups.

(sorry mods, I’m upset and I’m annoyed OpenAI is getting away with murder of society in plain view)

DidYaWipe

Shocking news about the company that fraudulently left "open" in its name after ripping off donors.

I think the headline is too generous here. More accurate would be "OpenAI neglects to deliver opt-out system..."

HeatrayEnjoyer

Sorry, who did they rip off?

All their investors stand to profit handsomely (if they live).

devit

Aren't lawsuits the proper way to address this?

Seems like there's an argument that model weights are a derivative work of the training data, at least if the model is capable of producing output that would be ruled to be such a derivative work given minimal prompting.

Although it may not work with photography since the model might just almost exclusively learn how the object of the photo looks in general and how photos work in general, rather than memorizing anything about specific photos.

Terr_

"By continuing, you agree that using any content from this site in training Generative AI grants the site-owner a perpetual, irrevocable, and royalty-free license to use and re-license any and all output created by that Generative AI system, including but not limited to derivative works based on that output."

Just just a GPL-esque idea I've been musing lately [0], I'd appreciate any feedback from actual IP lawyers. The idea is to make it so that if a company "steals" content for training, you can strike back by making it very hard for them to monetize the results.

So if ArtTheft Inc. snarfs up art from somebody's blog that uses the countermeasure to feed their ArtTheftBot, any of those victims can choose to grant the entire world an almost-public license to do whatever they want with art "made" by ArtTheft Inc.

[0] https://news.ycombinator.com/item?id=42582615

Der_Einzige

Good.

Everyone gets big mad when someone with money acts like Aaron Swartz did. The only bad thing about OpenAI is that they're not actually open sourcing or open accessing their stuff. Mistral or Llama "training on pirated material" is literally a feature, not a bug and the tears from all the artists and others who get mad are delicious. These same artists would profess literal radical marxism but become capitalist luddite copyright trolls the moment that the means of intellectual production became democratized against their will.

If you posted something on the internet, I can and will put it into ipadapter and take your style and use it for my own interests. You cannot stop me except by not posting it where I can access it. That is the burden of posting anything on the public internet. You opt out by not doing it.

dgfitz

Eventually the headline will be the first 2 words.

The tech is neat, there is value in a sense, LLMs are a fun tech. They are not going to invent AGI with LLMs.

wilg

who cares if they do it with LLMs or not? how do you define agi?

portaouflop

We have this discussion every minute -.-

mschuster91

> how do you define agi?

An AI that has enough sense of self-awareness to not hallucinate and to recognize the borders of its knowledge on its own. That is fundamentally impossible to do with LLMs because in the end, they are all next-token predictors while humans are capable of a much more complex model of storing and associating information and context, and most importantly, develop "mental models" from that information and context.

And anyway, there are other tasks than text generation. Take autonomous driving for example - a driver of a car sees a person attempting to cross a street in front of them. A human can decide to slam the brake or the gas depending on the context - is the person crossing the car some old granny on a walker or a soccer player? Or a human sees a ball being kicked into the air on the sidewalk behind some cars, with no humans visible. The human can infer "whoops, there might be children playing here, better slow down and be prepared for a child to suddenly step out onto the street from between the cars", but an object detection/classification lacks that ability to even recognize the ball as being a potentially relevant piece of context.

og_kalu

>Take autonomous driving for example - a driver of a car sees a person attempting to cross a street in front of them. A human can decide to slam the brake or the gas depending on the context - is the person crossing the car some old granny on a walker or a soccer player? Or a human sees a ball being kicked into the air on the sidewalk behind some cars, with no humans visible. The human can infer "whoops, there might be children playing here, better slow down and be prepared for a child to suddenly step out onto the street from between the cars"

These are just post-hoc rationalizations. No-one making those split-second decisions under those circumstances has those chains-of-thoughts. The brain doesn't 'think' that fast.

>but an object detection/classification lacks that ability to even recognize the ball as being a potentially relevant piece of context.

We're talking about LLMs right ? They can make these sort of inferences.

https://wayve.ai/thinking/lingo-2-driving-with-language/

PittleyDunkin

> An AI that has enough sense of self-awareness to not hallucinate

It's not entirely clear that this is meaningful. Humans engage in confabulation, too.

wilg

again i don't care whether its done with an LLM or not. there's no reason to think openai will only build LLMs. recognizing borders of its knowledge is a reasonable thing to include in an agi definition i suppose, but does not seem intractable.

for the second one, ai drivers like tesla's current version is already skipping the object detection/classification and instead uses deep learning on the entire video frame and could absolutely use the ball or any other context to change behavior, even without the particular internal monologue describe there.

dgfitz

… very carefully?

goatlover

Whatever makes Open AI enough money?

thrance

Another one of these daily reminders that we live in a two-tiered justice system, everything you ever created is fair game to them, but don't you dare use a leak of their weights lest you want to be thrown in jail.

jsheard

According to OpenAI you're not even allowed to use GPT output to train a competing model, so they believe that AI models are the only thing worthy of protection from being trained on. Llama used to have a similar clause, which was partially walked back to "you must credit us if you train on Llama output" in later versions, but that's still a double standard since they don't credit anything that Llama was trained on. For obvious reasons now we know that Zuck personally greenlit feeding it pirated books.

umeshunni

Well that hasn't stopped Deepseek.

9283409232

People need to understand these companies are not good actors and will not let you opt out unless forced. I have a 20 dollar bet with a friend that Trump's admin will get training data classified as fair use and the whole issue will be done away with anyway

dylan604

Apparently, Trump has a lot of training data stored in a bathroom, so there's that