Detect and crash Chromium bots

53 comments

·May 7, 2025

oefrha

> The call to page.evaluate just hangs, and the browser dies silently. browser.close() is never reached, which can cause memory leaks over time.

Not just memory leaks. Since a couple months ago, if you use Chrome via playwright etc. on macOS, it will deposit a copy of Chrome (more than 1GB) into /private/var/folders/kd/<...>/X/com.google.Chrome.code_sign_clone/, and if you exit without a clean browser.close(), the copy of Chrome will remain there. I noticed after it ate up ~50GB in two days. No idea what's the point of this code sign clone thing, but I had to add --disable-features=MacAppCodeSignClone to all my invocations to prevent it, which is super annoying.

closewith

That's an open bug at the minute, but the one saving grace is that they're APFS clones so don't actually consume disk space.

oefrha

Interesting, IIRC I did free up quite a bit of disk space when I removed all the clones, but I also deleted a lot of other stuff that time so I could be mistaken. du(1) being unaware of APFS clones makes it hard to tell.

chrismorgan

Checking https://issues.chromium.org/issues/340836884, I’m mildly surprised to find the report just under a year old, with no attention at all (bar a me-too comment after four months), despite having been filed with priority P1, which I understand is supposed to mean “aim to fix it within 30 days”. If it continues to get no attention, I’m curious if it’ll get bumped automatically in five days’ time when it hits one year, given that they do something like that with P2 and P3 bugs, shifting status to Available or something, can’t quite remember.

I say only “mildly”, because my experience on Chromium bugs (ones I’ve filed myself, or ones I’ve encountered that others have filed) has never been very good. I’ve found Firefox much better about fixing bugs.

carlhjerpe

I guess it depends on what kind of bug it is, this took 25 years to fix https://news.ycombinator.com/item?id=40431444

Dylan16807

To be fair that bug was only P3.

wraptile

I find the "don't let googlebot see this" kinda funny considering how top google results are often much worse. The captcha/anti-bot is getting so bad I had to move to Kagi to block some domains specifically as browsing contemporary web is almost impossible at times. Why isn't google down ranking this experience?

lifthrasiir

Previously on HN: Detecting Noise in Canvas Fingerprinting https://news.ycombinator.com/item?id=43170079

The reception was not really positive for the obvious reason at that time.

wslh

In Google Chrome, at least, I tried an infinite loop modifying document.title and it freezes pages in other tabs as well. Now, I am not at my computer to try again.

null

[deleted]

neuroelectron

I, for one, find it hilarious that "headless browsers" are even required. JavaScript interpreters serving webpages is just another amusing bit of serendipity. "Version-less HTML" hahaha

kevin_thibedeau

It exists because adtech providers and CDNs punish legitimate users who don't execute untrusted code on their property.

Thorrez

Headless browsers exist because adtech providers and CDNs punish legitimate users who don't execute untrusted code on their property?

If we ask the creators of headless chrome or selenium why they created them, would they say "because adtech providers and CDNs punish legitimate users who don't execute untrusted code on their property"?

Bjartr

Whether or not it's true aside, why people decide to do something and why they say something is being done don't have to match.

immibis

Another use is testing websites.

jillyboel

[flagged]

seventh12

The intention is to crash bots' browsers, not users' browsers

ramesh31

Please point me to this 100% correct bot detection system with zero false positives.

FridgeSeal

You understand the difference between intent and reality right?

The article even warns about this side-effect.

jillyboel

[flagged]

h4ck_th3_pl4n3t

If you are scraping forbidden data in my robots.txt, I don't give a damn. I am gonna mess with your bots however I like, and I'm willing to go as far as it takes to teach you a lesson about respecting my robots.txt.

anthk

If you are crashing some browser from a disallowed directory in robots.txt, is not your fault.

lightedman

If that's the case what do we do about websites and apps which do things like disable your back button (mobile phone's direct one) or your right click capabilities (desktop browser) while such functionality disabling is not present in the ToS or even presented to you upon visiting the site or using the app?

dmitrygr

Then maybe we need laws about crashing my server by crawling it 163,000 times per minute nonstop, ignoring robots.txt? Until then, no pity for the bots.

jillyboel

if your software crashes due to normal usage then you only have yourself to blame

dmitrygr

Yes indeed. Nginx running out of RAM due to A”I” companies hammering my server is my fault.

sMarsIntruder

Running a bot farm?

jillyboel

of course not, why are you immediately jumping at accusations? if i was i'd just patch the bug locally and thank OP for pointing out how they're doing it.

it's just blatantly illegal and i wouldn't want anyone to get into legal trouble

omneity

[flagged]

randunel

How do you deal with the usual CF, akamai and other fingerprinting and blocking you? Or is that the customer's job to figure out?

omneity

Thank you for the question! It depends on the scale you're operating at.

1. For individual use (or company use but each user is on their device) typically the traffic is drown out in regular user activity since we use the same browser and no particular measure is needed, it just works. We have options for power users.

2. For large scale use, we offer tailored solutions depending on the anti-bot measures encountered. Part of it is to emulate #1.

3. We don't deal with "blackhat bots", so we don't offer support to work around legitimate anti-bot measures such as social spambots etc.

lyu07282

If you don't put significant effort into it, any headless browser from cloud IP ranges will be banned by large parts of the internet. This isn't just about spam bots, you can't even read news articles in many cases. You will have some competition from residential proxies and other custom automation solutions that take care of all of that for their customers.

erekp

We have a similar solution at metalsecurity.io :) handling large-scale automation for enterprise use cases, bypassing antibots

omneity

That's super cool, thank you for sharing! It's based on playwright though right? Can you verify if the approach you are using is also subject to the bug in TFA?

My original point was not necessarily about bypassing anti-bot protections, and rather to offer a different branch of browser automation independent of incumbent solutions such as Puppeteer, Selenium and others, which we believe are not made for this purpose, and has many limitations as TFA mentions, requiring way too many workarounds as your solution illustrates.

erekp

we fix leaks and bugs of automation frameworks, so we don't have that problem. The approach of using the user's browser, like yours, is that you will burn the user's fingerprint depending on scale.

volemo

Guess we gotta find a way to crash these bots too. :D