An early look at cryptographic watermarks for AI-generated content

yalogin

Without going into the technical efficacy of such schemes ( I am a skeptic), the proposed solution requires the entity generating the media to use it. Isn’t that a flaw? Why would an attacker use it willingly? If they did not want to push an AI generated content as a real one, they would have willingly made that distinction themselves.

The point is, there is no good solution here unless there is regulation, but these attempts at solutions are useful in the long run

colmmacc

GenAI producers have an incentive to watermark ... it helps them avoid consuming generated output for their own training processes. Most "attackers" aren't going to be sophisticated enough to use a modified non-watermarking tool, and those are likely to fall behind in capability over time anyway. So there's a decent chance here that things could align for watermarking without needing regulation. It probably hinges on whether the stenography can be good enough to avoid being trivially removed or undone.

jfarina

I disagree that attackers aren't sophisticated enough to use modified tools. There are entire work campuses dedicated to committing fraud. There's also state sponsored subterfuge. There's no reason to think that bad actors are intrinsically unsophisticated.

atrus

There's nothing wrong with consuming generated output for training. Blindly accepting trash input is the issue.

amelius

> GenAI producers have an incentive to watermark

Yeah in the long run you might be right, but there will be lots of people looking for a quick opportunity. E.g. a content farmer trying to SEO. If Google punishes websites for serving AI generated content, you know where this will end.

_factor

Did you mean to stay steganography perchance?

jazzyjackson

Putting on my suspension of disbelief goggles, I've long desired an internet with more cryptographic provenance, just some way so when people share a tweet on some other platform, I know that string was signed by that author, or some photo published by AP is signed by AP. It's just a real "we have the technology" situation. Keybase.io was moving us in this direction before they got acquihired by Zoom.

Anyway. In this future, my camera would have its jpgs signed with Nikon or Sony's certificate [0][1], art created by OpenAI would be signed up open AI, and images without any signature would be suspect by default.

[0] https://www.dpreview.com/news/6352280282/sony-content-authen...

[1] https://en.wikipedia.org/wiki/Content_Authenticity_Initiativ...

exac

Regulation makes this worse. Imagine if every gen-AI media was required to have some signature. The bad actors won't use this signature. Or worse, they will add the signature to real media to cast doubt. Or both at the same time.

Also, how much gen AI is enough to require a watermark? If I generativley fill the part of my photo that I blurred with my thumb, should the entire photo be watermarked? Just the part that I filled in? What if I use a gen AI tool, but only apply a transformation that does not require gen AI (like adding a label on an image)?

Cryptographic watermarks for Gen AI or for non-Gen AI media is dangerous from a security standpoint.

ben_w

> Why would an attacker use it willingly?

Lots of people are very lazy.

Citation: all the times we already spot obvious AI-generated comment, which in the early days included the models literally saying "As an AI language model" and this ending up in Amazon reviews etc.

rvnx

Example: https://twitter.com/KingSafdar441/status/1776041379040309612

currymj

i also doubt practicality. but if the technology worked "well enough" and the major consumer-facing AI companies did it, it would help a lot with certain problems.

if you're technically sophisticated you can easily evade it of course. but 1) most people aren't technically sophisticated, 2) one can ask "why are you going through all this effort to remove the watermark if you aren't trying to deceive anyone"?

ForHackernews

Isn't it better/easier to go the other way? What if cameras included some kind of secured element that signed real content?

Maybe it would technically be possible to defeat, but we're already pretty good at making it difficult/expensive to extract a private key from hardware.

OnACoffeeBreak

That's what Content Authenticity Initiative (CAI) is trying to accomplish.

https://contentauthenticity.org/how-it-works

colmmacc

Keep in mind that CAI can't stop you from pointing your camera at a copied or faked work. There's only so much it can attest to.

realharo

You can add depth sensors or lidars. Still possible to fake, but more difficult.

PeterStuer

I'm not sure a good case is made here regarding the "problems" this is intended to solve.

OTOH, could this be another step towards prohibiting Open Source models?

xyzal

Well, the moment there is any watermarking algorithm on github i'll gladly 'watermark' all my human output.

linuxguy2

I was thinking something similar. Participate online without feeding models via a self-selected DNT-like tag...

Domguinard

Pretty thorough blogpost. One clarification though: digital watermarking has been a core (but optional) part of C2PA since version 2.1. There is now a standard way of linking any (approved) digital watermarking technology to C2PA manifests and use the watermarks to created stronger content credentials and allow recovering removed manifests, more about this here: https://www.digimarc.com/blog/c2pa-21-strengthening-content-...

quickpopin

Building increasingly advanced and hard to detect tools for, and obligating platforms to allow, user media uploads with embedded steganographic data is a disaster from a legal and content moderation perspective.

pizzafeelsright

Watermarking seems silly considering the original intent of the Internet was sharing data. The value is in the delivery.

RKFADU_UOFCCLEL

People like this aren't actually concerned with the problems they talk about, they just stop thinking it through when it looks like the meta is favorable to their business model. Then they say "the internet is broken, only we can save it". Etc., nothing new or interesting even from a political perspective. For example how Google one day out of the blue decided they need to track mouse movement to prove anyone is human (in this case, likely to feed data to police because that's a globally unique identifier). They just decided that's the only solution to the "Problem" (TM).

HN

An early look at cryptographic watermarks for AI-generated content

An early look at cryptographic watermarks for AI-generated content