I use zip bombs to protect my server
466 comments
·April 28, 2025seanhunter
kqr
The lesson for any programmers reading this is to always set an upper limit for how much data you accept from someone else. Every request should have both a timeout and a limit on the amounts of data it will consume.
keitmo
As a former boss used to say: "Unlimited is a bad idea."
eru
That doesn't necessarily need to be in the request itself.
You can also limit the wider process or system your request is part of.
kqr
While that is true, I recommend on the request anyway, because it makes it abundantly clear to the programmer that requests can fail, and failure needs to be handled somehow – even if it's by killing and restarting the process.
guappa
Then you kill your service which might also be serving legitimate users.
mjmsmith
Around the same time, or maybe even earlier, some random company sent me a junk fax every Friday. Multiple polite voicemails to their office number were ignored, so I made a 100-page PDF where every page was a large black rectangle, and used one of the new-fangled email-to-fax gateways to send it to them. Within the hour, I got an irate call. The faxes stopped.
quaddo
Circa 1997 a coworker lamented that he had signed up for some email list, and attempts to unsubscribe weren’t working (more of a manual thing, IIRC). I made the suggestion to set up a cronjob to run hourly, to send an email request to be unsubscribed. It would source a text file containing the request to be unsubscribed. And with each iteration, it would duplicate the text from the file, effectively a geometric progression. The list owner responded about a week or so later, rather urgently requesting that my coworker cut it out, saying that he would remove him from the list. Apparently the list owner had been away on vacation the entire time.
mkwarman
I enjoyed reading this, thank you for sharing. When you say you tried to contact the admin of the box and that this was common back then, how would you typically find the contact info for an arbitrary client's admin?
cobbaut
Back then things like postmaster@theirdomain and webmaster@theirdomain were read by actual people. Also the whois command often worked.
dspearson
I work for one of the largest Swiss ISPs, and these mailboxes are still to this day read by actual people (me included), so it's sometimes worthwhile even today.
rekabis
A responsible domain owner still will read them. My own postmaster is a catch-all for all my domains, such that typos in the username still get caught. Has proven to be invaluable with the family domain, where harried medical staff make mistakes in setting up accounts for my parents.
kqr
You can also find out who owns a general group of IP addresses, and at the time they would often assist you in further pinpointing who is responsible for a particular address.
__david__
I always liked the RP DNS record (https://www.rfc-editor.org/rfc/rfc1183) but no one seems to know about it or use it any more. The only reason my servers don't have one now is because route53 doesn't support it.
DocTomoe
tech-c / abuse addresses were commonly available on whois.
ge96
tangent
I had a lazy fix for a down detection on my RPi server at home, it was pinging a domain I owned and if it couldn't hit that assumed it wasn't connected to a network/rebooted itself. I let the domain lapse and this RPi kept going down around 5 minutes... thought it was a power fault, then I remembered about that CRON job.
danillonunes
That's why everyone else is lazy and just ping google.com
MomsAVoxell
You’d be surprised to know, that in a majority of the cases of NT installations in that era, providing services, there were very, very few admins around to even notice what was going on. Running services like this on an NT box was done ‘in order to not have to have an admin’, in so many thousands of cases, it cannot be underestimated.
Disclaimer: I put a lot of servers on the Internet in the 90’s/early 2000’s. It was industry-wide standard practice: ‘use NT so you don’t need an admin’.
dmos62
What was it about NT that made an admin unnecessary? Just marketing?
zerr
Didn't get why that WinNT box was connecting to your box. Due to some misconfigured Windows update procedure?
seanhunter
I never found this out, but there was some feature where NT would try to negotiate an encrypted connection to communicate and that’s the port it was connecting on. It’s a long time ago. It’s possible the box had been pwned, and that was command/control for a botnet or something. Lots of internet-facing windows boxes were at the time because MS security was absolutely horrendous at this time.
gigatexal
That’s awesome! Thank you for sharing.
layer8
Back when I was a stupid kid, I once did
ln -s /dev/zero index.html
on my home page as a joke. Browsers at the time didn’t like that, they basically froze, sometimes taking the client system down with them.Later on, browsers started to check for actual content I think, and would abort such requests.
bobmcnamara
I made a 64kx64k JPEG once by feeding the encoder the same line of macro blocks until it produce the entire image.
Years later I was finally able to open it.
opan
I had a ton of trouble opening a 10MB or so png a few weeks back. It was stitched together screenshots forming a map of some areas in a game, so it was quite large. Some stuff refused to open it at all as if the file was invalid, some would hang for minutes, some opened blurry. My first semi-success was Fossify Gallery on my phone from F-Droid. If I let it chug a bit, it'd show a blurry image, a while longer it'd focus. Then I'd try to zoom or pan and it'd blur for ages again. I guess it was aggressively lazy-loading. What worked in the end was GIMP. I had the thought that the image was probably made in an editor, so surely an editor could open it. The catch is that it took like 8GB of RAM, but then I could see clearly, zoom, and pan all I wanted. It made me wonder why there's not an image viewer that's just the viewer part of GIMP or something.
Among things that didn't work were qutebrowser, icecat, nsxiv, feh, imv, mpv. I did worry at first the file was corrupt, I was redownloading it, comparing hashes with a friend, etc. Makes for an interesting benchmark, I guess.
For others curious, here's the file: https://0x0.st/82Ap.png
I'd say just curl/wget it, don't expect it to load in a browser.
Scaevolus
That's a 36,000x20,000 PNG, 720 megapixels. Many decoders explicitly limit the maximum image area they'll handle, under the reasonable assumption that it will exceed available RAM and take too long, and assume the file was crafted maliciously or by mistake.
lgeek
On Firefox on Android on my pretty old phone, a blurry preview rendered in about 10 seconds, and it was fully rendered in 20 something seconds. Smooth panning and zooming the entire time
virtue3
I use honey view for reading comics etc. It can handle this.
Old school acdsee would have been fine too.
I think it's all the pixel processing on the modern image viewers (or they're just using system web views that isn't 100% just a straight render).
I suspect that the more native renderers are doing some extra magic here. Or just being significantly more OK with using up all your ram.
Moosdijk
It loads in about 5 seconds on an iPhone 12 using safari.
It also pans and zooms swiftly
bugfix
IrfanView was able to load it in about 8 seconds (Ryzen 7 5800x) using 2.8GB of RAM, but zooming/panning is quite slow (~500ms per action)
promiseofbeans
Firefox on a mid-tier Samsung and a cheapo data connection (4G) took avout 30s to load. I could pan, but it limited me from zooming much, and the little I could zoom in looked quite blury.
beeslol
For what it's worth, this loaded (slowly) in Firefox on Windows for me (but zooming was blurry), and the default Photos viewer opened it no problem with smooth zooming and panning.
Meneth
On my Waterfox 6.5.6, it opened but remained blurry when zoomed in. MS Paint refused to open it. The GIMP v2.99.18 crashed and took my display driver with it. Windows 10 Photo Viewer surprisingly managed to open it and keep it sharp when zoomed in. The GIMP v3.0.2 (latest version at the time of writing) crashed.
ack_complete
I once encoded an entire TV OP into a multi-megabyte animated cursor (.ani) file.
Surprisingly, Windows 95 didn't die trying to load it, but quite a lot of operations in the system took noticeably longer than they normally did.
M95D
I wonder if I could create a 500TB html file with proper headers on a squashfs, an endless <div><div><div>... with no closing tags, and if I could instruct the server to not report file size before download.
Any ideeas?
Ugohcet
Why use squashfs when you can do the same OP did and serve a compressed version, so that the client is overwhelmed by both the uncompression and the DOM depth:
yes "<div>"|dd bs=1M count=10240 iflag=fullblock|gzip | pv > zipdiv.gz
Resulting file is about 15 mib long and uncompresses into a 10 gib monstrosity containing 1789569706 unclosed nested divs
sroussey
You can also just use code to endlessly serve up something.
Also you can reverse many DoD vectors depending on how you are setup and costs. For example reverse Slowloris attack and use up their connections.
M95D
I like it. :)
imron
This is beautiful
CobrastanJorji
Yes, servers can respond without specifying the size by using chunked encoding. And you can do the rest with a custom web server that just handles request by returning "<div>" in a loop. I have no idea if browsers are vulnerable to such a thing.
konata390
I just tested it via a small python script sending divs at a rate of ~900mb (as measured by curl) and firefox just kills the request after 1-2 gb received (~2 seconds) with an "out of memory" error, while chrome seems to only receive around 1mb/s, uses 1 cpu core 100%, and grows infinitely in memory use. I killed it after 3 mins and consuming ca. 6GB (additionally, on top of the memory it used at startup)
M95D
I would make it an invisible link from the main page (hidden behind a logo or something). Users won't click it, but bots will.
m463
Sounds like the favicon.ico that would crash the browser.
I think this was it:
https://freedomhacker.net/annoying-favicon-crash-bug-firefox...
dolmen
Looks like something I should add for my web APIs which are to be queried only by clients aware of the API specification.
koolba
I hope you weren’t paying for bandwidth by the KiB.
amelius
Maybe it's time for a /dev/zipbomb device.
GTP
ln -s /dev/urandom /dev/zipbomb && echo 'Boom!'
Ok, not a real zip bomb, for that we would need a kernel module.
wfn
> Ok, not a real zip bomb, for that we would need a kernel module.
Or a userland fusefs program, nice funky idea actually (with configurable dynamic filenames, e.g. `mnt/10GiB_zeropattern.zip`...
Dwedit
That costs you a lot of bandwidth, defeating the whole point of a zip bomb.
AStonesThrow
Wait, you set up a symlink?
I am not sure how that could’ve worked. Unless the real /dev tree was exposed to your webserver’s chroot environment, this would’ve given nothing special except “file not found”.
The whole point of chroot for a webserver was to shield clients from accessing special files like that!
vidarh
You yourself explain how it could've worked: Plenty of webservers are or were not chroot'ed.
pandemic_region
Which means that if your bot is getting slammed by this, you can assume it's not chrooted and hence a more likely target for attack.
M95D
Could server-side includes be used for a html bomb?
Write an ordinary static html page and fill a <p> with infinite random data using <!--#include file="/dev/random"-->.
or would that crash the server?
GTP
I guess it depends on the server's implementation. but, since you need some logic to decide when to serve the html bomb anyway, I don't see why you would prefer this solution. Just use whatever script you're using to detect the bots to serve the bomb.
M95D
No other scripts. Hide the link to the bomb behind an image so humans can't click it.
sandworm101
Devide by zero happens to everyone eventually.
https://medium.com/@bishr_tabbaa/when-smart-ships-divide-by-...
"On 21 September 1997, the USS Yorktown halted for almost three hours during training maneuvers off the coast of Cape Charles, Virginia due to a divide-by-zero error in a database application that propagated throughout the ship’s control systems."
" technician tried to digitally calibrate and reset the fuel valve by entering a 0 value for one of the valve’s component properties into the SMCS Remote Database Manager (RDM)"
fuzztester
I remember reading about that some years ago. It involved Windows NT.
astolarz
Bad bot
jeroenhd
These days, almost all browsers accept zstd and brotli, so these bombs can be even more effective today! [This](https://news.ycombinator.com/item?id=23496794) old comment showed an impressive 1.2M:1 compression ratio and [zstd seems to be doing even better](https://github.com/netty/netty/issues/14004).
Though, bots may not support modern compression standards. Then again, that may be a good way to block bots: every modern browser supports zstd, so just force that on non-whitelisted browser agents and you automatically confuse scrapers.
andersmurphy
So I actually do this (use compression to filter out bots) for my one million checkboxes Datastar demo[1]. It relies heavily on streaming the whole user view on every interaction. With brotli over SSE you can easily hit 200:1 compression ratios[2]. The problem is a malicious actor could request the stream uncompressed. As brotli is supported by 98% of browsers I don't push data to clients that don't support brotli compression. I've also found a lot of scrapers and bots don't support it so it works quite well.
[1] checkboxes demo https://checkboxes.andersmurphy.com
[2] article on brotli SSE https://andersmurphy.com/2025/04/15/why-you-should-use-brotl...
kevin_thibedeau
If you nest the gzip inside another gzip it gets even smaller since the blocks of compressed '0' data are themselves low entropy in the first generation gzip. Nested zst reduces the 10G file to 99 bytes.
galangalalgol
Can you hand edit to create recursive file structures to make it infinite? I used to use debug in dos to make what appeared to be gigantic floppy discs by editing the fat
hidroto
https://research.swtch.com/zip
it is basically a quine.
necovek
That's what I was hoping for with the original article.
Thorrez
But the bot likely only automatically unpacks the outer layer. So nesting doesn't help with bot deterrence.
Cloudef
Wouldnt that defeat the attack though as you arent serving the large content anymore
kevin_thibedeau
It would need a bot that is accessing files via hyperlink with an aim to decompress them and riffle through their contents. The compressed file can be delivered over a compressed response to achieve the two layers, cutting down significantly on the outbound traffic. passwd.zst, secrets.docx, etc. would look pretty juicy. Throw some bait in honeypot directories (exposed for file access) listed in robots.txt and see who takes it.
xiaoyu2006
How will my browser react on receiving such bombs? I’d rather not to test it myself…
jeroenhd
Last time I checked, the tab keeps loading, freezes, and the process that's assigned to rendering the tab gets killed when it eats too much RAM. Might cause a "this tab is slowing down your browser" popup or general browser slowness, but nothing too catastrophic.
How bad the tab process dying is, depends per browser. If your browser does site isolation well, it'll only crash that one website and you'll barely notice. If that process is shared between other tabs, you might lose state there. Chrome should be fine, Firefox might not be depending on your settings and how many tabs you have open, with Safari it kind of depends on how the tabs were opened and how the browser is configured. Safari doesn't support zstd though, so brotli bombs are the best you can do with that.
anthk
gzip it's everywhere and it will mess with every crawler.
bilekas
> At my old employer, a bot discovered a wordpress vulnerability and inserted a malicious script into our server
I know it's slightly off topic, but it's just so amusing (edit: reassuring) to know I'm not the only one who, after 1 hour of setting up Wordpress there's a PHP shell magically deployed on my server.
protocolture
>Take over a wordpress site for a customer
>Oh look 3 separate php shells with random strings as a name
Never less than 3, but always guaranteed.
ianlevesque
Yes, never self host Wordpress if you value your sanity. Even if it’s not the first hour it will eventually happen when you forget a patch.
sunaookami
Hosting WordPress myself for 13 years now and have no problem :) Just follow standard security practices and don't install gazillion plugins.
carlosjobim
There's a lot of essential functionality missing from WordPress, meaning you have to install plugins. Depending on what you need to do.
But it's such a bad platform that there really isn't any reason for anybody to use WordPress for anything. No matter your use case, there will be a better alternative to WordPress.
ozim
I have better things to do with my time so I happily pay someone else to host it for me.
arcfour
Never use that junk if you value your sanity, I think you mean.
UltraSane
I once worked for a US state government agency and my coworker was the main admin of our WordPress based portal and it was crazy how much work it was to keep working.
ufmace
Ditto to self-hosting wordpress works fine with standard hosting practices and not installing a bazillion random plugins.
maeln
I never hosted WP, but as soon as you have a HTTP server expose to the internet you will get request to /wp-login and such. It as become a good way to find bots also. If I see an IP requesting anything from a popular CMS, hop it goes in the iptables holes
Perz1val
Hey, I check /wp-admin sometimes when I see a website and it has a certain feel to it
victorbjorklund
I do the same. Great way to filter our security scanners.
Aransentin
Wordpress is indeed a nice backdoor, it even has CMS functionality built in.
colechristensen
>after 1 hour
I've used this teaching folks devops, here deploy your first hello world nginx server... huh what are those strange requests in the log?
dx4100
There's ways that prevent it - - Freeze all code after an update through permissions - Don't make most directories writeable - Don't allow file uploads, or limit file uploads to media
There's a few plugins that do this, but vanilla WP is dangerous.
ChuckMcM
I sort of did this with ssh where I figured out how to crash an ssh client that was trying to guess the root password. What I got for my trouble was a number of script kiddies ddosing my poor little server. I switched to just identifying 'bad actors' who are clearly trying to do bad things and just banning their IP with firewall rules. That's becoming more challenging with IPV6 though.
Edit: And for folks who write their own web pages, you can always create zip bombs that are links on a web page that don't show up for humans (white text on white background with no highlight on hover/click anchors). Bots download those things to have a look (so do crawlers and AI scrapers)
grishka
> you can always create zip bombs that are links on a web page that don't show up for humans
I did a version of this with my form for requesting an account on my fediverse server. The problem I was having is that there exist these very unsophisticated bots that crawl the web and submit their very unsophisticated spam into every form they see that looks like it might publish it somewhere.
First I added a simple captcha with distorted characters. This did stop many of the bots, but not all of them. Then, after reading the server log, I noticed that they only make three requests in a rapid succession: the page that contains the form, the captcha image, and then the POST request with the form data. They don't load neither the CSS nor the JS.
So I added several more fields to the form and hid them with CSS. Submitting anything in these fields will fail the request and ban your session. I also modified the captcha, I made the image itself a CSS background, and made the src point to a transparent image instead.
And just like that, spam has completely stopped, while real users noticed nothing.
anamexis
I did essentially the same thing. I have this input in a form:
<label for="gb-email" class="nah" aria-hidden="true">Email:</label>
<input id="gb-email"
name="email"
size="40"
class="nah"
tabindex="-1"
aria-hidden="true"
autocomplete="off"
>
With this CSS: .nah {
opacity: 0;
position: absolute;
top: 0;
left: 0;
height: 0;
width: 0;
z-index: -1;
}
And any form submission with a value set for the email is blocked. It stopped 100% of the spam I was getting.DuncanCoffee
Would this also stop users with automatic form filling enabled?
zzo38computer
If CSS is disabled or using a browser that does not implement CSS, that might also be an issue. (A mode to disable CSS should ideally also be able to handle ARIA attributes (unless the user disables those too), but not all implementations will do this (actually, I don't know if any implementation does; it doesn't seem to on mine), especially if they were written before ARIA attributes were invented.)
BarryMilo
We use to just call those honeypot fields. Works like a charm.
ChuckMcM
Oh that is great.
a_gopher
apart from blind users, who are also now completely unable to use their screenreaders with your site
BehindTheMath
aria-hidden="true" should take care of that.
j_walter
Check this out if you want to stop this behavior...
dsp_person
> you can always create zip bombs that are links on a web page that don't show up for humans (white text on white background with no highlight on hover/click anchors)
RIP screen reader users?
some-guy
“aria-hidden” would spare those users, and possibly be ignored by the bots unless they are sophisticated.
1970-01-01
Why is it harder to firewall them with IPv6? I seems this would be the easier of the two to firewall.
carlhjerpe
Manual banning is about the same since you just book /56 or bigger, entire providers or countries.
Automated banning is harder, you'd probably want a heuristic system and look up info on IPs.
IPv4 with NAT means you can "overban" too.
malfist
Why wouldn't something like fail2ban not work here? That's what it's built for and has been around for eons.
firesteelrain
I think they are suggesting the range of IPs to block is too high?
CBLT
Allow -> Tarpit -> Block should be done by ASN
echoangle
Maybe it’s easier to circumvent because getting a new IPv6 address is easier than with IPv4?
leephillips
These links do show up for humans who might be using text browsers, (perhaps) screen readers, bookmarklets that list the links on a page, etc.
alpaca128
Weird that text browsers just ignore all the attributes that hide elements. I get that they don't care about styling, but even a plain hidden attribute or aria-hidden are ignored.
ChuckMcM
true, but you can make the link text 'do not click this' or 'not a real link' to let them know. I'm not sure if crawlers have started using LLMs to check pages or not which would be a problem.
gwd
> I sort of did this with ssh where I figured out how to crash an ssh client that was trying to guess the root password. What I got for my trouble was a number of script kiddies ddosing my poor little server.
This is the main reason I haven't installed zip bombs on my website already -- on the off chance I'd make someone angry and end up having to fend off a DDoS.
Currently I have some URL patterns to which I'll return 418 with no content, just to save network / processing time (since if a real user encounters a 404 legitimately, I want it to have a nice webpage for them to look at).
Should probably figure out how to wire that into fail2ban or something, but not a priority at the moment.
flexagoon
Automated systems like Cloudflare and stuff also have a list of bot IPs. I was recently setting up a selfhosted VPN and I had to change the IPv4 of the server like 20 times before I got an IP that wasn't banned on half the websites.
bjoli
I am just banning large swaths of IPs. Banning most of Asia and the middle east reduced the amount of bad traffic by something like 98%.
johnisgood
Same, using ipsets, and a systemd {service,timer} for updating the lists.
marcusb
Zip bombs are fun. I discovered a vulnerability in a security product once where it wouldn’t properly scan a file for malware if the file was or contained a zip archive greater than a certain size.
The practical effect of this was you could place a zip bomb in an office xml document and this product would pass the ooxml file through even if it contained easily identifiable malware.
secfirstmd
Eh I got news for ya.
The file size problem is still an issue for many big name EDRs.
marcusb
Undoubtedly. If you go poking around most any security product (the product I was referring to was not in the EDR space,) you'll see these sorts of issues all over the place.
j16sdiz
It have to be the way it is.
Scanning them are resources intensive. The choice are (1) skip scanning them; (2) treat them as malware; (3) scan them and be DoS'ed.
(deferring the decision to human iss effectively DoS'ing your IT support team)
kazinator
I deployed this, instead of my usual honeypot script.
It's not working very well.
In the web server log, I can see that the bots are not downloading the whole ten megabyte poison pill.
They are cutting off at various lengths. I haven't seen anything fetch more than around 1.5 Mb of it so far.
Or is it working? Are they decoding it on the fly as a stream, and then crashing? E.g. if something is recorded as having read 1.5 Mb, could it have decoded it to 1.5 Gb in RAM, on the fly, and crashed?
There is no way to tell.
MoonGhost
Try content labyrinth. I.e. infinitely generated content with a bunch of references to other generated pages. It may help against simple wget and till bots adapt.
PS: I'm on the bots side, but don't mind helping.
palijer
This doesn't work if you pay bandwidth and CPU usage for your servers though.
Twirrim
The labyrinth doesn't have to be fast, and things like iocaine (https://iocaine.madhouse-project.org/) don't use much CPU if you don't go and give them something like the Complete Works of Ahakespeare as input (Mine is using Moby Dick), and can easily be constrained with cgroups if you're concerned about resource usage.
I've noticed that LLM scrapers tend to be incredibly patient. They'll wait for minutes for even small amounts of text.
MoonGhost
That will be your contribution. If others join scrapping will become very pricey. Till bots become smarter. But then they will not download much of generated crap. Which makes it cheaper for you.
Anyway, from bots perspective labyrinths aren't the main problem. Internet is being flooded with quality LLM-generated content.
bugfix
Wouldn't this just waste your own bandwidth/resources?
gwd
Kinda wonder if a "content labyrinth" could be used to influence the ideas / attitudes of bots -- fill it with content pro/anti Communism, or Capitalism, or whatever your thing is, hope it tips the resulting LLM towards your ideas.
arctek
Perhaps need to semi-randomize the file size? I'm guessing some of the bots have a hard limit to the size of the resource they will download.
Many of these are annoying LLM training/scraping bots (in my case anyway). So while it might not crash them if you spit out a 800KB zipbomb, at least it will waste computing resources on their end.
unnouinceput
Do they comeback? If so then they detect it and avoid it. If not then they crashed and mission accomplished.
kazinator
I currently cannot tell without making a little configuration change, because as soon as an IP address is logged as having visited the trap URL (honeypot, or zipbomb or whatever), a log monitoring script bans that client.
Secondly, I know that most of these bots do not come back. The attacks do not reuse addresses against the same server in order to evade almost any conceivable filter rule that is predicated on a prior visit.
jpsouth
I may be asking a really silly question here, but
> as soon as an IP address is logged as having visited the trap URL (honeypot, or zipbomb or whatever), a log monitoring script bans that client.
Is this not why they aren’t getting the full file?
KTibow
It's worth noting that this is a gzip bomb (acts just like a normal compressed webpage), not a classical zip file that uses nested zips to knock out antiviruses.
tga_d
There was an incident a little while back where some Tor Project anti-censorship infrastructure was run on the same site as a blog post about zip bombs.[0] One of the zip files got crawled by Google, and added to their list of malicious domains, which broke some pretty important parts of Tor's Snowflake tool. Took a couple weeks to get it sorted out.[1]
[0] https://www.bamsoftware.com/hacks/zipbomb/ [1] https://www.bamsoftware.com/hacks/zipbomb/#safebrowsing
wewewedxfgdf
I protected uploads on one of my applications by creating fixed size temporary disk partitions of like 10MB each and unzipping to those contains the fallout if someone uploads something too big.
warkdarrior
`unzip -p | head -c 10MB`
kccqzy
Doesn't deal with multi-file ZIP archives. And before you think you can just reject user uploads with multi-file ZIP archives, remember that macOS ZIP files contain the __MACOSX folder with ._ files.
sidewndr46
What? You partitioned a disk rather than just not decompressing some comically large file?
gchamonlive
https://github.com/uint128-t/ZIPBOMB
2048 yottabyte Zip Bomb
This zip bomb uses overlapping files and recursion to achieve 7 layers with 256 files each, with the last being a 32GB file.
It is only 266 KB on disk.
When you realise it's a zip bomb it's already too late. Looking at the file size doesn't betray its contents. Maybe applying some heuristics with ClamAV? But even then it's not guaranteed. I think a small partition to isolate decompression is actually really smart. Wonder if we can achieve the same with overlays.sidewndr46
What are you talking about? You get a compressed file. You start decompressing it. When the amount of bytes you've written exceeds some threshold (say 5 megabytes) just stop decompressing, discard the output so far & delete the original file. That is it.
est
damn, it broke the macOS archiver utility.
kccqzy
Seems like a good and simple strategy to me. No real partition needed; tmpfs is cheap on Linux. Maybe OP is using tools that do not easily allow tracking the number of uncompressed bytes.
wewewedxfgdf
Yes I'd rather deal with a simple out of disk space error than perform some acrobatics to "safely" unzip a potential zip bomb.
Also zip bombs are not comically large until you unzip them.
Also you can just unpack any sort of compressed file format without giving any thought to whether you are handling it safely.
anthk
I'd put fake paper namers (doi.numbers.whatever.zip) in order to quickly keep their attention, among a robots.txt file for a /papers subdirectory to 'disallow' it. Add some index.html with links to fake 'papers' and in a week these crawlers will blacklist your like crazy.
monster_truck
I do something similar using a script I've cobbled together over the years. Once a year I'll check the 404 logs and add the most popular paths trying to exploit something (ie ancient phpmyadmin vulns) to the shitlist. Requesting 3 of those URLs adds that host to a greylist that only accepts requests to a very limited set of legitimate paths.
gherard5555
There is a similar thing for ssh servers, called endlessh (https://github.com/skeeto/endlessh). In the ssh protocol the client must wait for the server to send back a banner when it first connects, but there is no limit for the size of it ! So this program will send an infinite banner very ... very slowly; and make the crawler/script kiddie script hang out indefinitely or just crash.
Once upon a time around 2001 or so I used to have a static line at home and host some stuff on my home linux box. A windows NT update had meant a lot of them had enabled this optimistic encryption thing where windows boxes would try to connect to a certain port and negotiate an s/wan before doing TCP traffic. I was used to seeing this traffic a lot on my firewall so no big deal. However there was one machine in particular that was really obnoxious. It would try to connect every few seconds and would just not quit.
I tried to contact the admin of the box (yeah that’s what people used to do) and got nowhere. Eventually I sent a message saying “hey I see your machine trying to connect every few seconds on port <whatever it is>. I’m just sending a heads up that we’re starting a new service on that port and I want to make sure it doesn’t cause you any problems.”
Of course I didn’t hear back. Then I set up a server on that port that basically read from /dev/urandom, set TCP_NODELAY and a few other flags and pushed out random gibberish as fast as possible. I figured the clients of this service might not want their strings of randomness to be null-terminated so I thoughtfully removed any nulls that might otherwise naturally occur. The misconfigured NT box connected, drank 5 seconds or so worth of randomness, then disappeared. Then 5 minutes later, reappeared, connected, took its buffer overflow medicine and disappeared again. And this pattern then continued for a few weeks until the box disappeared from the internet completely.
I like to imagine that some admin was just sitting there scratching his head wondering why his NT box kept rebooting.