Hyperscaling Have I Been Pwned with Cloudflare Workers and Caching
53 comments
·April 23, 2025randunel
mmsc
So have most websites in the world. I recently found out report-uri.com uses Cloudflare turnstile which specifically blocks the type of activity that I imagine one would actually want from a CSP-violation.
I like to write about these cases in my spare time, e.g.https://joshua.hu/losing-sight-vision-mission-of-your-role-p... and https://joshua.hu/losing-sight-vision-mission-of-your-role-p.... My all time favorite was when I was in hospital and couldn't connect to my travel insurance company's website because they blocked IP addresses from the country I was in (wasn't cloudflare though, I don't think: https://joshua.hu/losing-sight-vision-mission-of-your-role)
nottorp
Yep, Cloudflare is a great service... for the US parts of the internet.
The road to hell is paved with good intentions.
Incidentally I first read the title as "I have been pwned by cloudflare workers and caching".
weird-eye-issue
I live on the opposite side of the globe and have no problems using Cloudflare. Also, my SaaS is deployed on Cloudflare and we have users in hundreds of countries who use it with no problem
randunel
> my SaaS is deployed on Cloudflare and we have users in hundreds of countries who use it with no problem
How would you know you have a problem if you outright ban non-conformant users? Is your customer support function not behind cloudflare, and accessible to users without an account?
nottorp
You mean Australia/NZ? :)
huijzer
> Yep, Cloudflare is a great service... for the US parts of the internet.
It's fine in Europe.
nottorp
Almost. I don't get captcha loops but they do get all worked up and captcha me once in a while, most likely because it's Firefox + uBlock Origin.
On Mac OS, if I used desktop Linux I'd probably get more.
weird-eye-issue
I think there is much more to it than just your location. Based on the cursor and UI, are you using Linux on Firefox? I'm not saying it shouldn't be supported, but I just think that there's definitely more factors at play here than "3rd world countries"
And depending on your definition of third-world country, I'm in one as well, and I don't have this sort of issue
decremental
HN skews heavily towards users with very unusual setups. Using one of the least popular browsers on an OS almost no one in the grand scheme of things uses makes you a statistical outlier on its own. Who knows what other obscure configuration choices could be making the problem even worse for such a user?
But yeah, it's none of that just Cloudflare hates brown people or something.
zorked
This is not just 3rd world countries. CloudFlare has broken the Internet and made it slower with all the stupid captchas. We were better off without them.
sharperguy
Is it really cloudflare doing this or endless bot attacks making these kind of tools necessary?
codelion
Do other services have the same problem? Like the https://amibreached.com/ ?
randunel
Only getting 500 errors from the search call:
Request URL: https://api-v3.amibreached.com/api/v1/cyble-it?SearchTerm=as... Request Method: GET Status Code: 500 Internal Server Error
huijzer
> HIBP has been rendered unusable to people in 3rd world countries because of endless CF captcha loops.
I don't know what the situation currently is with HIBP, but Cloudflare does allow setting the security level. Maybe at the time it was still set to high/normal instead of "low" or "essentially off".
tick_tock_tick
The author of this article set those settings himself.
benlivengood
I can see a fairly simple improvement to updating the entire HIBP dataset; version it and slowly roll traffic onto the new version by prefix, e.g. take the 6-hexit prefix mod 100 and if the result is < rollout_percentage then the cloud flare workers fetch from the new version, otherwise the old. Rollout_percentage can be a simple function of Unix timestamp for example.
vlovich123
Yeah a full cache purge seems like a non ideal solution vs what you wrote.
smallpipe
I must be missing something. You have 2MB of data for "is my 6 character SHA1 prefix in any breach". Why can't you distribute that to every edge nodes ?
Thorrez
There are tons of emails that share the same prefix. When you lookup a prefix, you can't simply get a boolean response. You have to get a list of emails as the response. The client then searches through the list to see if the desired email is in the list or not. Returning a list of emails instead of a single bit significantly increases the data size.
Additionally, people don't just want a boolean answer of "was my email breached somewhere". They want a list of all the breaches that breached the email. So the returned data actually needs to be a list of emails and the list of breaches that each email was breached in.
>Via the public API. This endpoint also takes an email address as input and then returns all breaches it appears in.
qw
> The client then searches through the list to see if the desired email is in the list or not.
The initial prefix check would probably reduce the amount of lookups necessary, as it would only be necessary to do a deeper search if the prefix matches.
lerp-io
can’t u just store single hash and use bloom filter or something to check if ur email is in hash on the client side also (or maybe that’s what they are doing and don’t wanna send the large data if it’s several mb idk)
lerp-io
i just checked and ai said bloom filter is faster and more efficient than k-anon lookup, maybe in the next article lol.
rokkamokka
A shame they can't push updated data to the caches directly without flushing them. They could otherwise potentially bypass the origin almost entirely by just pushing data once per breach to each edge node.
tpetry
They could by storing the results as files in R2 and letting cloudflare workers just return these.
rkagerer
In order to provide subscribers of this service with complete anonymity over the email addresses being searched for, the only data passed to the API is the first six characters of the SHA-1 hash of the full email address.
I take it the k-anonymity thing essentially "chunks" your database. So rather than returning one (identifying) result for one email address, you instead take the first few characters of its hash, and return a few hundred results corresponding to all the emails whose hashes share the same prefix. (A bit like asking for the "L-N" section of the phone book instead of asking for a single line).
I'd be curious what sort of attack vectors or "side channels" you wargammed out when coming up with the scheme - eg. if a block wound up mostly comprised of email domains (providers) popular in one part of the world, and only a few from another geography, then you might be able to infer identity via timezones based on time of day queries come in.
londons_explore
It's the other way round - less than 16M users could lead to privacy leaks. As long as the database contains many more than 16M users, there is (fairly) reasonable privacy.
rkagerer
Thanks. I'm just replying here to confirm your response made sense to my comment as originally posted (I edited it after I went back and read his linked posts in a bit more detail).
I do appreciate how we've found a way to make hash collisions useful :-).
ilogik
It's essentially a hash map. The API will return all the hashes and the info for each one, and it's up to the client to show the relevant information and ignore the rest
gnabgib
(2024)
charcircuit
Alternatively these dumps could be made public and you could get privacy for free instead of having to query a service.
It's frustrating how secretive this is all treated and how to get anything useful you have to go on telegram instead of there being an open way of checking.
rkagerer
By "dumps" are you talking about the information HIBP makes available? Or the raw, leaked data it came from (potentially containing breached passwords and whatnot).
mschuster91
> The response from each search was coming back so quickly that the user wasn’t sure if it was legitimately checking subsequent addresses they entered or if there was a glitch.
That's sad - and far too common. We're so conditioned to web sites and even apps being unusably slow and plagued by latency thanks to Electron and multi-megabyte JS bundle slop that the exception, software that is actually responsive and slim, is being judged as abnormal instead of an ideal to follow...
rjh29
If anything we're conditioned by millions of years of evolution to expect things to take time. Things happening instantly in a digital void is relatively new.
The majority of users seem to prefer slow animations for anything that changes, and flight/hotel search pages have used artificial delays for decades.
Almondsetat
Users might prefer slow animations, but they absolutely loathe low responsiveness. We have been conditioned by nature to expect immediate feedback from our actions in the physical world. Sure, turning a page might be quite slow, but the moment you hold the paper between your fingers it immediately moves and you get that feeling of control over the object.
mschuster91
> The majority of users seem to prefer slow animations for anything that changes
Tech conditioned people to this expectation. Tech could have also gone and say "no, screw you, we will not introduce artificial slowness Just Because" and in 5-10 years people would have adapted. Swim or die.
It's just the same with IT in general. In the Nordic and Baltic countries, even beggars have credit-card terminals because no one carries cash any more. Most if not all public service is done exclusively online - and yet we do not hear the horror stories of elderly people dying because they can't apply for social security that people are drawing up here in Germany.
People have the capacity to change and adapt, and one does not have to coddle adults.
whstl
I would say it's neither... it's not preference, but also not really conditioning...
The point of those fake animations or fake spinners is showing that "it worked" in the absence of "success" feedback.
I work with offline-first apps and we did some user testing. We have to be careful about things like navigating between pages, because if it's too fast the user will not register the change, and will assume it was an error.
Now THIS is the fault of tech industry, and where I agree that it's conditioning: a lot of tech products simply fail silently, or have very long timeouts, so users are conditioned to translate "lack of response" with "failure".
There are alternatives to animations, however: different designs between pages, changes close to the mouse pointer, or in the case of list refresh showing the "last refreshed 1 second ago"... or even showing a popup with "Successfully loaded". Often this is hated by designers (although the "success popup" is also hated by users), which is why people look for alternatives.
mystifyingpoi
> Most if not all public service is done exclusively online
I'm from Poland (so close) and I find this true here too, but in 99% of the cases there is a human fallback. You can file your tax return online, but nothing stops you from driving to tax office and filing a paper form there, with a pen.
> yet we do not hear the horror stories of elderly people dying
Well, that's because 100% of the time, elderly people will use public and private healthcare by a phone call with a human. Even if apps and such are available.
jve
> Nordic and Baltic countries, even beggars have credit-card terminals
Oh, haven't seen that here (Riga). Actually I was out in the capital few days ago and people play music on streets... I can't tip them because I have no coins and they have no terminal or QR code that would lead to a page that enables tipping.
Most places have cc terminals and buying stuff from hands also support sending money to bank account instantly using only sellers phone number, but some parts are still coins only. My wallet doesn't support coins unfortunately.
simooooo
It’s legitimately confusing when you see a blink in the corner of your eye and don’t realise something has updated. Or miss it altogether.
It’s a UI problem in how to make it update immediately but also have an indication that it’s updated.
HIBP has been rendered unusable to people in 3rd world countries because of endless CF captcha loops.
https://imgur.com/a/K5z1X2R