Skip to content(if available)orjump to list(if available)

Modern Perfect Hashing

Modern Perfect Hashing

9 comments

·October 24, 2025

rurban

I am working on modern perfect hashing also. First for integer keys, and then also for millions of keys. Should be practical, not academic.

https://github.com/rurban/gperf/tree/autotools or some other branch. Forgot which.

https://github.com/rurban/cmph/tree/compressed_o (lotsa improvements)

https://github.com/rurban/pthash (compile to static C++)

I've also extended nbperf for those features

adzm

Make sure to read the post linked right at the beginning as well: http://0x80.pl/notesen/2023-04-30-lookup-in-strings.html as well as the magic bitboards linked, too https://www.chessprogramming.org/Magic_Bitboards

Though honestly this post really needed some numbers and benchmarks.

Sesse__

I never really finished the project, thus only the rough qualitative benchmarks you get at the bottom (measured mostly by profiling and size(1)); I saw that it wasn't enough of a win in the larger context where I needed it, thus it made sense to stop early instead of making exhaustive benchmarks.

The blog post was mainly for curious readers, I'm surprised it hit HN at all. :-)

jibal

gperf is very limited in the number of keys it can handle as opposed to, say, https://burtleburtle.net/bob/hash/perfect.html

Sesse__

Well, again, different problem constraints, different solutions. Seemingly that tool can handle larger sets than gperf (although it claims gperf stops at a couple hundred, which is an exaggeration; try it with the first 1000 lines of /usr/dict/words and it's nearly instant, and with the first 10k it needs 35 seconds or so), but it also says the runtime is even slower. My goal was to have faster runtime, not handle more keys. YMMV.

o11c

One thing not mentioned: very often "give up and allow a not-quite-perfect hash" is a reasonable solution.

mgaunard

What's most annoying with gperf and similar tools is that they aren't really suited to applications where the set of keys is known at runtime during initialization.