CAPTCHAs: 'a tracking cookie farm for profit masquerading as a security service'
103 comments
·February 10, 2025jp191919
I'm at the point now that if I get a CAPTCHA, I'm just going to leave the site. I'll spend my money elsewhere or find an alternative
a2128
My government's websites require solving a reCAPTCHA for basic services, which is horrifying. They also use Cloudflare which blocks me sometimes. This is in the EU
phoronixrly
Confirming this. I am also completely certain that gratuitous CAPTCHA use is banned for government systems by my country's set of laws governing their implementation. The judicial system and the community have not matured enough to consider this a breach of law worthy of fighting against...
openplatypus
Name and shame, please!!
ReCAPTCHA due lack of opt out is effectively illegal in the EU.
phoronixrly
reCAPTCHA (and others based outside of EU) is illegal on privacy ground (in any site, not just owned by EU entities). Homebrew CAPTCHAs are illegal due to their general lack of accessibility (in any site owned by an EU entity), and in Bulgaria their gratuitous use is banned in government sites on account of them being poor UX (not enforced unless caught during the acceptance phase of a project).
An example of an inaccessible homebrew CAPTCHA that causes very poor UX can be found on the portal that provides access to the legal acts of the Bulgarian judicial system: https://legalacts.justice.bg/ . Try taking the legal system to court. I tried for this one, you can see for yourself how it went.
cyberax
This automatically means that you're penalizing smaller websites. And killing off the independent alternatives to Reddit/Disqus. Do you want this?
Large sites like Amazon or CNN can afford to eat the bot traffic. Smaller sites can't.
cryptoegorophy
Problem isn’t a bot traffic. I run an Ecommerce site and scammers run python scripts to test 1000s of cards per hour if there is no captcha. I hate it, my customers hate it, scammers hate it, but it is the only thing that keeps my merchant account running. Any advise is welcome!
technion
Logon forms are another whole issue. "Lock out the account" is just a DoS vector. People are quick to talk about systems that can defeat a captcha but if the brute force goes from 50 passwords/sec to one password/10 sec it's mission accomplished.
onetokeoverthe
would requiring a un/pw sent to an email address work?
Zak
> killing off the independent alternatives to Reddit/Disqus
I haven't encountered a captcha using Lemmy. There might be one on some servers for account creation.
KennyBlanken
What are you on about?
I've used Amazon from the same IP address for years and I still regularly get the "you look like a bot, solve this" crap.
mouse_
Did you read the article? What you said directly goes against the study's conclusion.
cyberax
I'm helping a neighbor to run a small e-commerce website with reviews. Review forms are being spammed by bots that get even through CAPTCHAs, and the owner needs to clean them up constantly. Without CAPTCHAs, it becomes unsustainable.
They don't get a lot of bots trying stolen credit cards, but mostly because they are pretty niche.
noah_buddy
Sounds a heck of a lot like the bots are killing off these websites. Gross overuse of automated scraping is a fact of life but individual choice is intolerable. What if I told you they were the same thing?
cyberax
Yes, bot traffic is killing the open web. What's your point?
Dotnaught
Google addressed the claims in this paper last year, and one of the authors challenged the company's responses. See: https://www.theregister.com/2024/07/24/googles_recaptchav2_l...
kevin_thibedeau
As of two weeks ago my locked down Firefox profile gets hit with captchas on every visit to Google search. DDG has also gone to shit with captchas and stupid low cache lifetime because I use their non-javascript site. I'm giving Bing a test run before making the leap to Kagi.
EVa5I7bHFq9mnYK
Try also Startpage, it doesn't give me any captchas even though I am a career criminal: guilty of adblocking under Firefox influence, while commiting a VPN. They also have a nice Anonymous view.
btown
The "cookie farm for profit" point is worth elaborating on. From the original paper https://arxiv.org/pdf/2311.10911 :
> More concretely, the current average value life-time of a cookie is €2.52 or $2.7 [58]. Given that there have been at least 329 billion reCAPTCHAv2 sessions, which created tracking cookies, that would put the estimated value of those cookies at $888 billion dollars.
The cited paper is https://www.sciencedirect.com/science/article/pii/S016781162... - but it doesn't deal with CAPTCHAs, just with the general economics of third-party cookies.
In practice, many of these cookies will have already been placed by other Google services on the site in question, with how ubiquitous Google's ad and analytics products are. And it's unclear whether Google uses the _GRECAPTCHA cookies for purposes other than the CAPTCHA itself (in the places where this isn't regulated).
But reCAPTCHA does gives Google an ability to have scripts running that fundamentally can't be ad-blocked without breaking site functionality, and it's an effective foot in the door if Google ever wanted to use it more broadly. It's absolutely something to be aware of.
eykanal
The problem with this paper is that, while technically true, there are many website owners who have found that CAPTCHAs have effectively reduced the spam on their site to zero. The fact that a CAPTCHA _can_ be bypassed doesn't mean that it _will_, and most spam bots are not using cutting-edge tech because that's expensive.
To say "it's worthless from a security perspective" is a pretty harsh and largely inaccurate representation. It's been tremendously useful to those who have used it. If it wasn't valuable, it wouldn't be so widely used.
Definitely agree with the whole "tons of free $$$ for Google", but that's kind of their business model, so yeah, Google is being Google. In other breaking news, water is still wet.
Scaevolus
Far too many people talk about security as if it's a simple binary and not about effort levels and dissuading attackers.
rachofsunshine
People really struggle with things that have measurable, probabilistic effects. You see it with healthcare ("Steve smoked his whole life and never got cancer, so cigarettes aren't bad for you!"), environmental effects ("Alice was poor and she didn't rob anyone, so poverty is no excuse!"), hiring ("Charlie is a great employee and he had no experience, so you should never look at backgrounds!"), etc.
It should be a general standard of proof for any sort of sociological claim that you look at rates, not just examples, but it usually isn't.
jchw
Well, I would at least ask what the baseline was. The vast majority of websites on the internet don't really have to deal with sophisticated bot traffic, and a very simple traditional CAPTCHA, one that can be trivially solved using existing technology, will also cut SPAM to zero or very close. I don't know exactly why this is, but I suspect it's because most of the bot operations that scale far enough to hit low volume websites are very sensitive to cost (and hence unlikely to deploy relatively-expensive modern multi-modal LLMs to solve a problem) and not likely to deploy site-specific approaches to SPAM.
There are a lot of things that can trivially cut down SPAM ranging from utterly unhelpful to just simply a bad idea. Like for example, you can deny all requests from IPs that appear to be Russian or Chinese: that will cut out a lot of malicious traffic. It will also cut some legitimate traffic, but maybe not much if your demographics are narrow. ReCAPTCHA also cuts some legitimate traffic.
The actual main reason why people deployed reCAPTCHA is because it was free and easy, effectiveness was just table stakes. The problem with CAPTCHAs prior to reCAPTCHA is simply that they really weren't very good; the stock CAPTCHAs in software packages like MediaWiki or phpBB were just rather unsophisticated, and as a double whammy, they were big targets for attack since developing a reliable solver for them would unlock bot access to a very large number of web properties.
Do you need reCAPTCHA to make life hard for bots, though? Well, no. Having a bespoke solution is enough for most websites on the Internet. However, reCAPTCHA isn't even necessarily the best choice even for something extremely high-volume. Case-in-point, last I checked, Google's own DDoS protection system still used a bespoke CAPTCHA that largely hasn't changed since the early 2010s; you can see what it looks like by searching for the Google "sorry" page.
I agree that reCAPTCHA is not "worthless" but it's worth is definitely overstated. Automated services that solve CAPTCHAs charge less than a cent per-solve. For reCAPTCHA to be very effective against direct adversaries rather than easily-thwarted random bots, the actual value of bypassing your CAPTCHA has to be pretty damn low. At that point, it's very reasonably possible that even hashcash would be enough to keep people from SPAMing.
chrbr
Yeah, we've used CAPTCHAs to great effect as gracefully-degraded service protection for unauthenticated form submissions. When we detect that a particular form is being spammed, we automatically flip on a feature flag for it to require CAPTCHAs to submit, and the flood immediately stops. Definitely saves our databases from being pummeled, and I haven't seen a scenario since we implemented it a few years ago where the CAPTCHA didn't help immediately.
Reminds me of the advice around the deadbolt on your house - it won't stop a determined attacker, but it will deter less-determined ones.
ghuroo1
That made us spend 819 million hours clicking on traffic lights to generate nearly $1 trillion for Google.
voisin
At an approx 750,000 hours in a human lifespan, they wasted 1100 human lives in totality. Unbelievable.
thechao
There's a dystopian short story in your comment about AI that can't self-bootstrap without ground-truth from humans, so they keep us around just to mark images, music, etc. Lives wasted annotating things. I like to think they'd drag us from solar system to solar system for this purpose.
taftster
Gosh. This is too perfect. I feel like you've just captured the exact moment we're living in.
extraduder_ire
Does solving captchas generate $1000/hour? I assume you're conflating amounts here, or messed up an order of magnitude somewhere.
rozab
That's just the headline of the article.
The researchers put the vast majority of this value to tracking cookies, and this revenue happens whether or not a manual challenge is completed.
null
breppp
I get that people are here to hate on Google, but I am just here to say that reCAPTCHA albeit acquired, is an absolutely brilliant idea. The kind that solves two (three? if you count tracking) problems so elegantly
phoronixrly
Absolutely agreed on the 'very elegant solution for global-scale tracking' part!
extraduder_ire
The people who created the initial version that got bought went on to create duolingo, with a similar goal of getting people to produce translations of text.
therein
Multi-purpose trojan horse. Not only will it look beautiful in your city but you can use it as scaffolding to repair tall buildings or children in your community could use it as a play gym.
pupppet
What's the alternative?
josefresco
I can tell you on the small level asking a simple question to activate the form action stops 99% of spam. Something like "What color is snow?" Granted, with a well trained "AI" system solving these questions would be trivial but I have yet to see it in practice.
dewey
Sounds easy, but at this point everyone is trained to solve these captchas and implementing the questions is not a quick thing either on a bigger scale (Translations, cultural differences, bots easily bypassing them etc.). I've used captchas on my sites before because bots were just hammering the login form, checking checkboxes and causing me to rack up email sending bills.
phoronixrly
Sorry for nitpicking but you need a puzzle that is knowledge-agnostic (be it cultural or scientific), otherwise you're guarding your site from both bots and people unfamiliar with the concept of or lacking the pre-existing knowledge necessary to solve the puzzle.
What colour is snow is close but you can't assume that everyone knows what snow is, let alone what colour it is. This includes both people with disabilities and in parts of the world where there is no snow...
josefresco
I agree, and thankfully we're dealing with mostly regional visitors to small local business/organization websites. Not a global audience. That being said, it's hard to think of a simple question, with little to no ambiguity.
Once example is for a landscaper: What is the color of healthy grass?
The answer is "green" of course, but grass is common in our region. That question would not work in a culture or region unfamiliar with "lawn grass".
idunnoman1222
There are no humans that know the word snow who don’t know what colours Snow is
throwaway7283
> What colour is snow is close but you can't assume that everyone knows what snow is, let alone what colour it is. This includes both people with disabilities and in parts of the world where there is no snow...
Google will happily ask you to point out which squares contain fire hydrants. Is there a captcha that meets your standards?
e2le
There are two alternatives I'm aware of, one is Attestation of Personhood[1] proposed by Cloudflare, the other is a proof-of-work[2] which the Tor project have themselves introduced[3].
[1]: https://blog.cloudflare.com/introducing-cryptographic-attest...
[2]: https://github.com/mCaptcha/mCaptcha
[3]: https://blog.torproject.org/introducing-proof-of-work-defens...
jszymborski
While I get the draw, I never understood how PoW is ever supposed to work practically.
PoW tasks are meant to work on a wide range of mobile phones, desktops, single-board computers, etc... you have vastly different compute budgets in every environment. For a PoW task that is usable on a five year old mobile phone, an adversary with a consumer RTX 50 series card (or potentially even an ASIC) can easily perform it many, many, many orders of magnitude faster.
Am I missing something?
johnmaguire
PoW isn't meant to make something impossible, it's meant to attach a cost to it. Now you need to extract a value higher than the cost.
lq9AJ8yrfs
You are not missing something, you are finding it: the game theory of bots vs anti-bots is subtle and somewhat different from regular software engineering and cyber security.
For the most part bots wish to be hidden and sites wish to reveal them, and this plays out over repeat games on small and large scales. Can be near-constantly or intermittently.
The bot usually gets to make the first move against a backdrop that the anti-bot may or may not have a hand in.
nonchalantsui
Since this was focused on v2 and other interactive captcha, the alternative is to upgrade to new versions that don’t do so. Still some downsides (and the study does address very briefly the use of AI to trick v3), but at the very least it does address some of the concerns.
Important to note though that as AI gets more accessible then the downsides of v3 start to weigh more.
Zak
For a lot of places where I've encountered captchas, they could just do nothing. Simple rate limiting should probably be the next step. It's not one-size-fits-all of course.
cccbbbaaa
I've heard about form fields hidden with CSS multiple times. No idea how effective this is though.
etchalon
Honeypot have gotten less effective as bots have moved to using headless-browser agents vs simple POSTs.
null
gtsop
I think we need to critically re-evaluate what is it exactly we are doing on the internet, how we do it, and examine existing assumptions. For instance, do we really need all services to be centralised? Do we really need services to be "free" (part of the payment is selling your data ok). A server serving static files doesn't care about bot users, but apps... why would you let a stranger use your cpu/ram over the internet? I know i am not providing an answer but i believe we need to take a look again at all of these before we try to come up with an answer
atoav
Building your own captcha or running one that doesn't sell your users data to the highest bidder?
What a time where people on a site called "Hacker News" ask such a question..
phoronixrly
And if you ever get so big that people start writing bespoke software to break your CAPTCHA, then investing some more engineering effort into it will quite likely not be a problem.
Of course reCAPTCHA is also still vulnerable to the use of a mechanical turk so even giving away your users' data won't save you.
ThatPlayer
I've come across a CAPTCHA on a website I was scraping that was absolutely terrible. It was 10 multiple image choice answer, with a question to click the image that had "X". Their implementation didn't even have a nonce, so I would just attempt every single answer and get past it.
darkwater
Naive question: how can clicking on the motorbike or traffic light image help to train an ML algorithm if they already know what image has a motorbike in it, or otherwise the captcha would not make sense. Maybe they put 3 image which are already with a score of >0.90 and one which is just 0.40?
michaelt
Hypothetically speaking, if they've got a 97% good ML model, they could implement a captcha where if you disagree with their model you have to do a second image, and a third image and so on. Then they could show each image to several different humans, and only if a bunch of people disagree with the model do they take a closer look.
Frankly a lot of the images I get are... kinda easy? This isn't the classic book-reading recaptcha where you could see why the text had confused the OCR.
woleium
they ask you to solve two. one they know, the other they don’t
DougN7
I’m not sure. If I don’t click on one that is a bus it won’t let me forward. It’s not like I click an “Ok, I’m done” button. I guess we could all delay clicking and maybe it would give up and assume the unknown bus wasn’t really a bus after all?
null
null
To prevent the cookie wall with no 'reject all': https://archive.is/oHc1e