That 16B password story (a.k.a. "data troll")
5 comments
·August 13, 2025charcircuit
If there was an open database of password breaches it would be easier for people to do research in if a leak was new or just a password taken from a previous leak. Of course you can get closer to the actual number by filtering out duplicates, but you can't figure out what's new if you can't know what's old.
mananaysiempre
Pwned Passwords[1] is just such a database (with passwords hashed using either SHA-1 or NTLM as an obfuscation measure, and without any emails). Hunt used to distribute versioned snapshots, but these days he directs you to an API scraper[2] in C# instead, so you can still get a list but it probably won’t exactly match anyone else’s.
[1] https://haveibeenpwned.com/passwords
[2] https://github.com/HaveIBeenPwned/PwnedPasswordsDownloader
charcircuit
This isn't sufficient for all cases. For example a breach could contained a hashed passwords. If you only have the obfuscated passwords of previous breaches you can't hash it yourself to know that the new breach is just a rehash of an existing one.
Data breaches can also contain other things than just passwords. Things like phone numbers, addresses, etc that would also be useful for checking.
anon7000
Publishing someone’s leaked credentials in plaintext for anyone to look at also isn’t ideal. I mean, yes, it’s been leaked, but we also don’t need to make it easier for someone to get hacked.
In other words, 2.7B -> 109M is a 96% reduction from headline to people. Could we apply the same maths to the 16B headline?
I mean there’s not 16B people in the world, so a row per person can be ruled out pretty easily