Ask HN: How to stop an AWS bot sending 2B requests/month?

175 comments

·October 17, 2025

I have been struggling with a bot– 'Mozilla/5.0 (compatible; crawler)' coming from AWS Singapore – and sending an absurd number of requests to a domain of mine, averaging over 700 requests/second for several months now. Thankfully, CloudFlare is able to handle the traffic with a simple WAF rule and 444 response to reduce the outbound traffic.

I've submitted several complaints to AWS to get this traffic to stop, their typical followup is: We have engaged with our customer, and based on this engagement have determined that the reported activity does not require further action from AWS at this time.

I've tried various 4XX responses to see if the bot will back off, I've tried 30X redirects (which it follows) to no avail.

The traffic is hitting numbers that require me to re-negotiate my contract with CloudFlare and is otherwise a nuisance when reviewing analytics/logs.

I've considered redirecting the entirety of the traffic to aws abuse report page, but at this scall, it's essentially a small DDoS network and sending it anywhere could be considered abuse in itself.

Are there others that have similar experience?

Visit

AdamJacobMuller

> I've tried 30X redirects (which it follows)

301 response to a selection of very large files hosted by companies you don't like.

When their AWS instances start downloading 70000 windows ISOs in parallel, they might notice.

Hard to do with cloudflare but you can also tar pit them. Accept the request and send a response, one character at a time (make sure you uncork and flush buffers/etc), with a 30 second delay between characters.

700 requests/second with say 10Kb headers/response. Sure is a shame your server is so slow.

notatoad

>301 response to a selection of very large files hosted by companies you don't like.

i suggest amazon

lgats

unfortunately, it seems AWS even has firewalls that will quickly start failing these requests after a few thousand, then they're back up to their high-concurrency rate

knowitnone3

Microsoft

gitgud

> Accept the request and send a response, one character at a time

Sounds like the opposite of the [1] Slow Loris DDOS attack. Instead of attacking with slow connections, you’re defending with slow connections

[1] https://www.cloudflare.com/en-au/learning/ddos/ddos-attack-t...

tliltocatl

That's why it is actually sometimes called inverse slow loris.

amy_petrik

it's called the slow sirol in my circles

tremon

As an alternative: 301 redirect to an official .sg government site, let local law enforcement deal with it.

integralid

Don't actually do this, unless you fancy meeting AWS lawyers in court and love explaining intricate details of HTTP to judges.

more_corn

I like this idea. Here’s how it plays out: Singapore law enforcement gets involved. They send a nasty-gram to AWS. lawyers get involved. AWS lawyers collect facts. Find that the culprit is not you, find that you’ve asked for help, find that they (AWS) failed to remediate, properly fix responsibility on the culprit and secondary responsibility on themselves, punch themselves in the crotch for a minute, and then solve the problem by canceling the account of the offending party.

more_corn

^ I love you

gruez

>When their AWS instances start downloading 70000 windows ISOs in parallel, they might notice.

Inbound traffic is free for AWS

kadoban

Free just means you get in trouble when you abuse it.

jacquesm

It's free, but it's not infinite.

swiftcoder

Making the obviously-abusive bot prohibitively expensive is one way to go, if you control the terminating server.

gzip bomb is good if the bot happens to be vulnerable, but even just slowing down their connection rate is often sufficient - waiting just 10 seconds before responding with your 404 is going to consume ~7,000 ports on their box, which should be enough to crash most linux processes (nginx + mod-http-echo is a really easy way to set this up)

gildas

Great idea, some people have already implemented it for the same type of need, it would seem (see the list of user agents in the source code). Implementation seems simple.

https://github.com/0x48piraj/gz-bomb/blob/master/gz-bomb-ser...

kijin

Be careful using this if you're behind cloudflare. You might inadvertently bomb your closest ally in the battle.

mkj

AWS customers have to pay for outbound traffic. Is there a way to get them to send you (or cloudflare) huge volumes of traffic?

horseradish7k

yeah, could use a free worker

compootr

free workers only get 100k reqs per day or something

_pdp_

A KB zip file can expand to giga / petabytes through recursive nesting - though it depends on their implementation.

sim7c00

thats traffic in the other direction

CWuestefeld

We've been a similar situation. One thing we considered doing is to give them bad data.

It was pretty clear in our case that they were scraping our site to get our pricing data. Our master catalog had several million SKUs, priced dynamically based on availability, customer contracts, and other factors. And we tried to add some value to the product pages, with relevant recommendations for cross-sells, alternate choices, etc. This was pretty compute-intensive, and the volume of the scraping could amount to a DoS at times. Like, they could bury us in bursts of requests so quickly that our infrastructure couldn't spin up new virtual servers, and once we were buried, it was difficult to dig back out from under the load. We learned a lot during this period, including some very counterintuitive stuff about how some approaches to queuing and prioritizing that appeared sounded great on paper, actually could have unintended effects that made such situations worse.

One strategy we talked about was that, rather than blocking the bad guys, we'd tag the incoming traffic. We couldn't do this perfect accuracy, but the inaccuracy was such that we could at least ensure that it wasn't affecting real customers (because we could always know when it was a real, logged-in user). We realized that we could at least cache the data in the borderline cases so we wouldn't have to recalculate (it was a particularly stupid bot that was attacking us, re-requesting the same stuff many times over); from that it was a small step to see that we could at the same time add a random fudge factor into any numbers, hoping to get to a state where the data did our attacker more harm than good.

We wound up doing what the OP is now doing, working with CloudFlare to identify and mitigate "attacks" as rapidly as possible. But there's no doubt that it cost us a LOT, in terms of developer time, payments to CF, and customer dissatisfaction.

By the way, this was all the more frustrating because we had circumstantial evidence that the attacker was a service contracted by one of our competitors. And if they'd come straight to us to talk about it, we'd have been much happier (and I think they would have been as well) to offer an API through which they could get the catalog data easily and in a way where we don't have to spend all the compute on the value-added stuff we were doing for humans. But of course they'd never come to us, or even admit it if asked, so we were stuck. And while this was going, there was also a case in the courts that was discussed many times here on HN. It was a question about blocking access to public sites, and the consensus here was something like "if you're going to have a site on the web, then it's up to you to ensure that you can support any requests, and if you can't find a way to withstand DoS-level traffic, it's your own fault for having a bad design". So it's interesting today to see that attitudes have changed.

gwbas1c

> rather than blocking the bad guys, we'd tag the incoming traffic

> had circumstantial evidence that the attacker was a service contracted by one of our competitors

> we'd have been much happier ... to offer an API through which they could get the catalog data easily

Why not feed them bad data?

CWuestefeld

We didn't like the ethics of it, especially since we couldn't guarantee that the bogus data was going only to the attacker (rather than to innocent but not-yet-authenticated "general public").

kristianp

Stupid question, won't that consume 7000 ports on your own box as well?

swiftcoder

7000 sockets, at any rate, but provided you've anticipated the need, this isn't challenging to support (and nginx is very good at handling large numbers of open sockets)

kijin

Each TCP connection requires a unique combination of (server port, client port). Your server port is fixed: 80 or 443. They need to use a new ephemeral port for each connection.

You will have 7000 sockets (file descriptors), but that's much more manageable than 7000 ports.

Neywiny

I think it'll eat 7000 connection objects, maybe threads, but they'll all be on port 80 or 443? So if you can keep the overhead of each connection down, presumably easy because you don't need it to be fast, it'll be fine

null

[deleted]

Orochikaku

Thinking along the same lines a PoW check like like anubis[1] may work for OP as well.

[1] https://github.com/TecharoHQ/anubis

hshdhdhehd

Avoid if you dont have to. It is not really good traffic friendly. Especially if current blocking works.

CaptainOfCoit

> Especially if current blocking works.

The submission and the context is when current blocking doesn't work...

null

[deleted]

winnie_ua

It was blocking me from accessing GNOME's gitlab instance from my cell phone.

So it mistakedly flagged me as bot. IDK. And it forces legitimate users to wait a while. Not great UX.

SergeAx

Wouldn't it consume the same number of connections on my server?

lagosfractal42

This kind of reasoning assumes the bot continues to be non-stealthy

lucastech

Yeah, there are some botnets I've been seeing that are much more stealthy, using 900-3000 IP's with rotating user agents to send enormous amounts of traffic.

I've resorted to blocking entire AS routes to prevent it (fortunately I am mostly hosting US sites with US only residential audiences). I'm not sure who's behind it, but one of the later data centers is oxylabs, so they're probably involved somehow.

https://wxp.io/blog/the-bots-that-keep-on-giving

swiftcoder

I mean, forcing them to spend engineering effort the make their bot stealthy (or to be able to maintains 10's of thousands of open ports), is still driving up their costs, so I'd count it as a win. The OP doesn't say why the bot is hitting their endpoints, but I doubt the bot is a profit centre for the operator.

lagosfractal42

You risk flagging real users as bots, which drives down your profits and reputation

somat

xkcd 810 comes to mind. https://xkcd.com/810/

"what if we make the bots go stealthy and indistinguishable from actual human requests?"

"Mission Accomplished"

HPsquared

This has pretty much happened now in the internet at large, and it's kinda sad.

heavyset_go

If going stealth means not blatantly DDoS'ing the OP then that's a better outcome than what's currently happening

xena

Main author of Anubis here. Have CloudFlare return a HTTP 200 response instead of a rejection at non-200. That makes the bots stop hammering until they get a 200 response.

kingforaday

If you see this, something isn't working with your main site: https://anubis.techaro.lol/

andrewmcwatters

I've also gotten good results just dropping the connection if it hits the application layer, and you can't get CloudFlare to return the desired behavior first.

Not ideal, but it seems to work against primitive bots.

Ameo

I thought you quit the orange site for good

neya

I had this issue on one of my personal sites. It was a blog I used to write maybe 7-8 years ago. All of a sudden, I see insane traffic spikes in analytics. I thought some article went viral, but realized it was too robotic to be true. And so I narrowed it down to some developer trying to test their bot/crawler on my site. I tried asking nicely, several times, over several months.

I was so pissed off that I setup a redirect rule for it to send them over to random porn sites. That actually stopped it.

sim7c00

this is the best approach honestly. redirect them to some place that undermines their efforts. either back to themselves, their own provider, or nasty crap that no one want to find in their crawler logs.

throwaway422432

Goatse?

Wouldn't recommend Googling it. You either know or just take a guess.

Rendello

I googled a lot of shock sites after seeing them referenced and not knowing what they were. Luckily Google and Wikipedia tended to shield my innocent eyes while explaining what I should be seeing.

The first goatse I actually saw was in ASCII form, funnily enough.

nosrepa

The Jason Scott method.

null

[deleted]

specialist

Maybe someone will publish a "nastylist" for redirecting bots.

Decades later, I'm still traumatized by goatse, so it'll have to be someone with more fortitude than me.

sim7c00

goatse, lemonparty, meatspin. take ur pick of the gross but clearnetable things.

mind you before google and the likes and the great purge of internet, these things were mild and humorous...

yabones

Return a 200 with the EICAR test string in the body. Nothing like some data poisoning for some vindictive fun

https://en.wikipedia.org/wiki/EICAR_test_file

tetha

Heh, I was wondering if you could do something like SSRF exploits, just the other way around. You know, redirect the bot to <cloud-provider-metadata-api>/shutdown.

Even funnier, include the EICAR test string in the redirect ot the cloud provider metadata. Maybe we could trip some automated compromise detection.

bigfatkitten

Do you receive, or expect to receive any legitimate traffic from AWS Singapore? If not, why not blackhole the whole thing?

caprock

Agreed. You should be able to set the waf to just drop the packets and not even bother with the overhead of a response. I think cloud flare waf calls this "block".

marginalia_nu

Yeah, this is the way. Dropping the packets makes the requests cheaper to respond to than to make.

The problem with DDoS-attacks is generally the asymmetry, where it requires more resources to deal with the request than to make it. Cute attempts to get back at the attacker with various tarpits generally magnifies this and makes it hit even harder.

jihadjihad

When the WAF drops packets, how does pricing work? I am assuming there is still a non-zero cost to handling that? Kind of sounded from OP that they are looking to shake the monkey off their back for good, and cheaply.

firecall

Yep, I did this for a while.

The TikTok Byte Dance / Byte Spider bots were making millions of image requests from my site.

Over and over again and they would not stop.

I eventually got Cloudinary to block all the relevant user agents, and initially just totally blocked Singapore.

It’s very abusive on the part of these bot running AI scraping companies!

If I hadn’t been using the kind and generous Cloudinary, I could have been stuck with some seriously expensive hosting bills!

Nowadays I just block all AI bots with Cloudflare and be done with it!

lozenge

Here's the IP address ranges- https://docs.aws.amazon.com/vpc/latest/userguide/aws-ip-work...

rkagerer

I had a similar problem back in 2018, though at a smaller scale.

I wrote a quick-and-dirty program that reads the authoritative list of all AWS IP ranges from https://ip-ranges.amazonaws.com/ip-ranges.json (more about that URL at the blog post https://aws.amazon.com/blogs/aws/aws-ip-ranges-json/), and creates rules in Windows Firewall to simply block all of them. Granted, it was a sledgehammer, but it worked well enough.

Here's the README.md I wrote for the program, though I never got around to releasing the the code: https://markdownpastebin.com/?id=22eadf6c608448a98b6643606d1...

It ran for some years as a scheduled task on a small handful of servers, but I'm not sure if it's still in use today or even works anymore. If there's enough interest I might consider publishing the code (or sharing it with someone who wants to pick up the mantle). Alternatively it wouldn't be hard for someone to recreate that effort.

G'luck!

scrps

Singapore's comms regulator bans porn (even possessing it), serve up some softcore to the bot, e-mail the regulator and AWS.

CaptainOfCoit

To be honest, I'd give that a try too. When someone is bothering you across the internet, the best way to reply is to use their local law system against them, not many other parties will care otherwise.

jedberg

Tell cloudflare it's abusive, and they will block it outside your account so it doesn't count against you.

geraldcombs

I ran into a similar situation a couple of years ago. It wasn't at the scale you describe, but it was an absurd number of requests for a ~80 MB software installer. I ended up redirecting the offending requests to a file named "please-stop.txt" that contained a short note explaining what was happening and asking them to stop. A short time later they did.

MrThoughtful

If it follows redirects, have you tried redirecting it to its own domain?

lgats

I've tried localhost redirects, doesn't impact the speed of their requests, all ports are closed on the suspect machines

Retric

A 100% legal solution is to sue them and name Amazon as a party in the lawsuit.

Through discovery you can get the name of the parties involved from Amazon, but Amazon is very likely to drop them as a client solving the issue.

Waterluvian

This sounds like it would probably cost tens of thousands of dollars just to get off the starting line.

Retric

Actually going through a lawsuit is expensive, “bluffing” long enough to send a nasty and credible letter can be relatively inexpensive.

Importantly it’s also getting moderately expensive for the other side which really discourages this kind of behavior. Suiting an arbitrary person you have no connection with invites a counter suit for wasting their money, but that largely goes away with such a one sided provocation.