Skip to content(if available)orjump to list(if available)

Messing with Scraper Bots

Messing with Scraper Bots

7 comments

·November 15, 2025

jcynix

If you control your own Apache server and just want to shortcut to "go away" instead of feeding scrapers, the RewriteEngine is your friend, for example:

      RewriteEngine On

      # Block requests that reference .php anywhere (path, query, or encoded)
      RewriteCond %{REQUEST_URI} (\.php|%2ephp|%2e%70%68%70) [NC,OR]
      RewriteCond %{QUERY_STRING} \.php [NC,OR]
      RewriteCond %{THE_REQUEST} \.php [NC]
      RewriteRule .* - [F,L]
Notes: there's no PHP on my servers, so if someone asks for it, they are one of the "bad boys" IMHO. Your mileage may differ.

s0meON3

lavela

"Gzip only provides a compression ratio of a little over 1000: If I want a file that expands to 100 GB, I’ve got to serve a 100 MB asset. Worse, when I tried it, the bots just shrugged it off, with some even coming back for more."

https://maurycyz.com/misc/the_cost_of_trash/#:~:text=throw%2...

ArcHound

Neat! Most of the offensive scrapers I met try and exploit WordPress sites (hence the focus on PHP). They don't want to see php files, but their outputs.

What you have here is quite close to a honeypot, sadly I don't see an easy way to counter-abuse such bots. If the attack is not following their script, they move on.

localhostinger

Interesting! It's nice to see people are experimenting with these, and I wonder if this kind of junk data generators will become its own product. Or maybe at least a feature/integration in existing software. I could see it going there.

NoiseBert69

Hm.. why not using dumbed down small, self-hosted LLM networks to feet the big scrapers with bullshit?

I'd sacrifice two CPU cores for this just to make their life awful.

leadgrids

[dead]