Modern Perfect Hashing
9 comments
·October 24, 2025adzm
Make sure to read the post linked right at the beginning as well: http://0x80.pl/notesen/2023-04-30-lookup-in-strings.html as well as the magic bitboards linked, too https://www.chessprogramming.org/Magic_Bitboards
Though honestly this post really needed some numbers and benchmarks.
Sesse__
I never really finished the project, thus only the rough qualitative benchmarks you get at the bottom (measured mostly by profiling and size(1)); I saw that it wasn't enough of a win in the larger context where I needed it, thus it made sense to stop early instead of making exhaustive benchmarks.
The blog post was mainly for curious readers, I'm surprised it hit HN at all. :-)
jibal
gperf is very limited in the number of keys it can handle as opposed to, say, https://burtleburtle.net/bob/hash/perfect.html
Sesse__
Well, again, different problem constraints, different solutions. Seemingly that tool can handle larger sets than gperf (although it claims gperf stops at a couple hundred, which is an exaggeration; try it with the first 1000 lines of /usr/dict/words and it's nearly instant, and with the first 10k it needs 35 seconds or so), but it also says the runtime is even slower. My goal was to have faster runtime, not handle more keys. YMMV.
o11c
One thing not mentioned: very often "give up and allow a not-quite-perfect hash" is a reasonable solution.
mgaunard
What's most annoying with gperf and similar tools is that they aren't really suited to applications where the set of keys is known at runtime during initialization.
I am working on modern perfect hashing also. First for integer keys, and then also for millions of keys. Should be practical, not academic.
https://github.com/rurban/gperf/tree/autotools or some other branch. Forgot which.
https://github.com/rurban/cmph/tree/compressed_o (lotsa improvements)
https://github.com/rurban/pthash (compile to static C++)
I've also extended nbperf for those features