Detect and crash Chromium bots
10 comments
·May 7, 2025oefrha
closewith
That's an open bug at the minute, but the ones saving grace is that they're APFS clones so don't actually consume disk space.
oefrha
Interesting, IIRC I did free up quite a bit of disk space when I removed all the clones, but I also deleted a lot of other stuff that time so I could be mistaken. du(1) being unaware of APFS clones makes it hard to tell.
chrismorgan
Checking https://issues.chromium.org/issues/340836884, I’m mildly surprised to find the report just under a year old, with no attention at all (bar a me-too comment after four months), despite having been filed with priority P1, which I understand is supposed to mean “aim to fix it within 30 days”. If it continues to get no attention, I’m curious if it’ll get bumped automatically in five days’ time when it hits one year, given that they do something like that with P2 and P3 bugs, shifting status to Available or something, can’t quite remember.
I say only “mildly”, because my experience on Chromium bugs (ones I’ve filed myself, or ones I’ve encountered that others have filed) has never been very good. I’ve found Firefox much better about fixing bugs.
lifthrasiir
Previously on HN: Detecting Noise in Canvas Fingerprinting https://news.ycombinator.com/item?id=43170079
The reception was not really positive for the obvious reason at that time.
omneity
Relevant plug: At Herd we offer a browser automation and orchestration framework that uses real browsers and thus sidesteps several of these issues[0]. The API is puppeteer-like but doesn't use it as we built the entire framework[1] from scratch.
If you're wondering about the emphasis on MCPs, Herd is a generalist automation framework with a bespoke package format – trails[2], that supports MCP and REST out-of-the-box.
1: https://herd.garden/docs/reference
2: https://herd.garden/docs/trails-automations
EDIT: I understand not everyone likes a shameless plug in another thread. The intention behind it however is also informative, as not every browser automation strategy is subject to the issues as in TFA.
randunel
How do you deal with the usual CF, akamai and other fingerprinting and blocking you? Or is that the customer's job to figure out?
omneity
Thank you for the question! It depends on the scale you're operating at.
1. For individual use (or company use but each user is on their device) typically the traffic is drown out in regular user activity since we use the same browser and no particular measure is needed, it just works. We have options for power users.
2. For large scale use, we offer tailored solutions depending on the anti-bot measures encountered. Part of it is to emulate #1.
3. We don't deal with "blackhat bots", so we don't offer support to work around legitimate anti-bot measures such as social spambots etc.
lyu07282
If you don't put significant effort into it, any headless browser from cloud IP ranges will be banned by large parts of the internet. This isn't just about spam bots, you can't even read news articles in many cases. You will have some competition from residential proxies and other custom automation solutions that take care of all of that for their customers.
> The call to page.evaluate just hangs, and the browser dies silently. browser.close() is never reached, which can cause memory leaks over time.
Not just memory leaks. Since a couple months ago, if you use Chrome via playwright etc. on macOS, it will deposit a copy of Chrome (more than 1GB) into /private/var/folders/kd/<...>/X/com.google.Chrome.code_sign_clone/, and if you exit without a clean browser.close(), the copy of Chrome will remain there. I noticed after it ate up ~50GB in two days. No idea what's the point of this code sign clone thing, but I had to add --disable-features=MacAppCodeSignClone to all my invocations to prevent it, which is super annoying.