Why are anime catgirls blocking my access to the Linux kernel?
41 comments
·August 20, 2025lxgr
> This isn’t perfect of course, we can debate the accessibility tradeoffs and weaknesses, but conceptually the idea makes some sense.
It was arguably never a great idea to begin with, and stopped making sense entirely with the advent of generative AI.
iefbr14
I wouldn't be surprised if just delaying the server response by some 3 seconds will have the same effect on those scrapers as Anubis claims.
ranger_danger
Yea I'm not convinced unless somehow the vast majority of scrapers aren't already using headless browsers (which I assume they are). I feel like all this does is warm the planet.
fluoridation
Hmm... What if instead of using plain SHA-256 it was a dynamically tweaked hash function that forced the client to run it in JS?
jsnell
No, the economics will never work out for a Proof of Work-based counter-abuse challenge. CPU is just too cheap in comparison to the cost of human latency. An hour of a server CPU costs $0.01. How much is an hour of your time worth?
That's all the asymmetry you need to make it unviable. Even if the attacker is no better at solving the challenge than your browser is, there's no way to tune the monetary cost to be even in the ballpark to the cost imposed to the legitimate users. So there's no point in theorizing about an attacker solving the challenges cheaper than a real user's computer, and thus no point in trying to design a different proof of work that's more resistant to whatever trick the attackers are using to solve it for cheap. Because there's no trick.
fluoridation
>An hour of a server CPU costs $0.01. How much is an hour of your time worth?
That's irrelevant. A human is not going to be solving the challenge by hand, nor is the computer of a legitimate user going to be solving the challenge continuously for one hour. The real question is, does the challenge slow down clients enough that the server does not expend outsized resources serving requests of only a few users?
>Even if the attacker is no better at solving the challenge than your browser is, there's no way to tune the monetary cost to be even in the ballpark to the cost imposed to the legitimate users.
No, I disagree. If the challenge takes, say, 250 ms on the absolute best hardware, and serving a request takes 25 ms, a normal user won't even see a difference, while a scraper will see a tenfold slowdown while scraping that website.
jsnell
The human needs to wait for their computer to solve the challenge.
You are trading something dirt-cheap (CPU time) for something incredibly expensive (human latency).
Case in point:
> If the challenge takes, say, 250 ms on the absolute best hardware, and serving a request takes 25 ms, a normal user won't even see a difference, while a scraper will see a tenfold slowdown while scraping that website.
No. A human sees a 10x slowdown. A human on a low end phone sees a 50x slowdown.
And the scraper paid one 1/1000000th of a dollar. (The scraper does not care about latency.)
That is not an effective deterrent. And there is no difficulty factor for the challenge that will work. Either you are adding too much latency to real users, or passing the challenge is too cheap to deter scrapers.
VMG
crawlers can run JS, and also invest into running the Proof-Of-JS better than you can
tjhorner
Anubis doesn't target crawlers which run JS (or those which use a headless browser, etc.) It's meant to block the low-effort crawlers that tend to make up large swaths of spam traffic. One can argue about the efficacy of this approach, but those higher-effort crawlers are out of scope for the project.
fluoridation
If we're presupposing an adversary with infinite money then there's no solution. One may as well just take the site offline. The point is to spend effort in such a way that the adversary has to spend much more effort, hopefully so much it's impractical.
ksymph
Reading the original release post for Anubis [0], it seems like it operates mainly on the assumption that AI scrapers have limited support for JS, particularly modern features. At its core it's security through obscurity; I suspect that as usage of Anubis grows, more scrapers will deliberately implement the features needed to bypass it.
That doesn't necessarily mean it's useless, but it also isn't really meant to block scrapers in the way TFA expects it to.
jhanschoo
Your link explicitly says:
> It's a reverse proxy that requires browsers and bots to solve a proof-of-work challenge before they can access your site, just like Hashcash.
It's meant to rate-limit accesses by requiring client-side compute light enough for legitimate human users and responsible crawlers in order to access but taxing enough to cost indiscriminate crawlers that request host resources excessively.
It indeed mentions that lighter crawlers do not implement the right functionality in order to execute the JS, but that's not the main reason why it is thought to be sensible. It's a challenge saying that you need to want the content bad enough to spend the amount of compute an individual typically has on hand in order to get me to do the work to serve you.
ksymph
Here's a more relevant quote from the link:
> Anubis is a man-in-the-middle HTTP proxy that requires clients to either solve or have solved a proof-of-work challenge before they can access the site. This is a very simple way to block the most common AI scrapers because they are not able to execute JavaScript to solve the challenge. The scrapers that can execute JavaScript usually don't support the modern JavaScript features that Anubis requires. In case a scraper is dedicated enough to solve the challenge, Anubis lets them through because at that point they are functionally a browser.
As the article notes, the work required is negligible, and as the linked post notes, that's by design. Wasting scraper compute is part of the picture to be sure, but not really its primary utility.
ranger_danger
The compute also only seems to happen once, not for every page load, so I'm not sure how this is a huge barrier.
jimmaswell
What exactly is so bad about AI crawlers compared to Google or Bing? Is there more volume or is it just "I don't like AI"?
Philpax
Volume, primarily - the scrapers are running full-tilt, which many dynamic websites aren't designed to handle: https://pod.geraspora.de/posts/17342163
immibis
Why haven't they been sued and jailed for DDoS, which is a felony?
ranger_danger
Criminal convictions in the US require a standard of proof that is "beyond a reasonable doubt" and I suspect cases like this would not pass the required mens rea test, as, in their minds at least (and probably a judge's), there was no ill intent to cause a denial of service... and trying to argue otherwise based on any technical reasoning (e.g. "most servers cannot handle this load and they somehow knew it") is IMO unlikely to sway the court... especially considering web scraping has already been ruled legal, and that a ToS clause against that cannot be legally enforced.
immibis
The actual answer to how this blocks AI crawlers is that they just don't bother to solve the challenge. Once they do bother solving the challenge, the challenge will presumably be changed to a different one.
rnhmjoj
I don't understand, why do people resort to this tool instead of simply blocking by UA string or IP address. Are there so many people running these AI crawlers?
I blackholed some IP blocks of OpenAI, Mistral and another handful of companies and 100% of this crap traffic to my webserver disappeared.
mnmalst
Because that solution simply does not work for all. People tried and the crawlers started using proxies with residential IPs.
hooverd
less savory crawlers use residential proxies and are indistinguishable from malware traffic
WesolyKubeczek
You should read more. AI companies use residential proxies and mask their user agents with legitimate browser ones, so good luck blocking that.
majorchord
> AI companies use residential proxies
Source:
rnhmjoj
Which companies are we talking about here? In my case the traffic was similar to what was reported here[1]: these are crawlers from Google, OpenAI, Amazon, etc. they are really idiotic in behaviour, but at least report themselves correctly.
yuumei
> The CAPTCHA forces vistors to solve a problem designed to be very difficult for computers but trivial for humans. > Anubis – confusingly – inverts this idea.
Not really, AI easily automates traditional captchas now. At least this one does not need extensions to bypass.
null
anotherhue
Surely the difficulty factor scales with the system load?
WesolyKubeczek
I disagree with the post author in their premise that things like Anubis are easy to bypass if you craft your bot well enough and throw the compute at it.
Thing is, the actual lived experience of webmasters tells that the bots that scrape the internets for LLMs are nothing like crafted software. They are more like your neighborhood shit-for-brain meth junkies competing with one another who makes more robberies in a day, no matter the profit.
Those bots are extremely stupid. They are worse than script kiddies’ exploit searching software. They keep banging the pages without regard to how often, if ever, they change. If they were 1/10th like many scraping companies’ software, they wouldn’t be a problem in the first place.
Since these bots are so dumb, anything that is going to slow them down or stop them in their tracks is a good thing. Short of drone strikes on data centers or accidents involving owners of those companies that provide networks of botware and residential proxies for LLM companies, it seems fairly effective, doesn’t it?
slowdoorsemillc
[dead]
This same author also ignored the security policy and dropped an Anubis double-spend attack on the issue tracker. Their email got eaten by my spam filter so I didn't realize that I got emailed at all.
Fun times.