IETF setting standards for AI preferences
12 comments
·March 22, 2025jsheard
Tomte
This is a way to express your reservation, pursuant to Article 4(3) of the EU's DSM Directive.
The legal machinery is already in place, we now need precisely that: a standard for machine-readable reservations.
felixfbecker
OpenAI and Anthropic respect robots.txt afaik
Ukv
To add anecdotally based on logging on my portfolio site, all major US players (OpenAI, Google, Anthropic, Meta, CommonCrawl) appeared to respect robots.txt as they claim to do (can't say the same of Alibaba).
Sometimes I do still get requests with their useragents, but generally from implausible IPs (residential IPs, or "Google-Extended" from an AWS range, or same IP claiming to be multiple different bots, ...) - never from the bots' actual published IP addresses (which I did see before adding robots.txt) - which makes me believe it's some third party either intentionally trolling or using the larger players as cover for their own bots.
dharmab
Using residential IPs is standard operating procedure for companies that rely on collecting information via web scraping. You can rent residential egress IPs. Sometimes this is done in a (kind of) legit way by companies that actually subscribe to residential ISPs. Mostly it's done by malware hijacking consumer devices.
VladVladikoff
Noooooope! They completely ignore crawl frequency in my experience. Bing too. Only Google seems to obey it.
mog_dev
They dont.
ddtaylor
Asking people to read your content with a specific purpose or intent has traditionally not been very successful or useful. I understand people are frustrated with the knowledge transfer, but if the goal was to increase the reach of your ideas, it's being accomplished.
AI being involved changes the scale and scope, but it doesn't change the fundamentals. China and India were already imitating and cloning everything for their markets and for ours.
We have had virtually zero success enforcing patent, copyright and barely even the lowest bar trademark enforcement. There may not be any framework for this kind of enforcement that I want to see that would be effective, but I am open to ideas that don't involve government overreach etc.
adrian_mrd
Does anyone know whether there any licences or licence derivatives - like the various flavors of Creative Commons - that currently restrict usage by AI LLMs?
CaptainFever
Licenses can only grant usage, in comparison to "all rights reserved". They can't restrict usage.
null
elitepleb
DNT: AI
I appreciate the effort, but without any legal backing these signals are just going to be ignored like robots.txt. Hell even if they were legally binding they'd probably still be ignored if scrapers thought they could obfuscate the paper trail enough to get away with it.