New high-quality hash measures 71GB/s on M4
23 comments
·May 14, 2025aidenn0
jandrewrogers
Quality matters. XXH3 fails something like 15% of the tests in SMHasher3. There are faster hash functions with much better quality profiles.
neonsunset
If whatever you are doing is heavy on hashmap lookups (and you are ok with not rewriting into something more bespoke+complicated) - the faster hash function and the cheaper baseline cost of calling it - the better (i.e. XXH3 can have disadvantages, with its popular impl. for dispatching to the necessary routine).
This looks like an interesting potential alternative to GxHash. GxHash is nice but sadly I found AES intrinsics to have somewhat high latency on AMD's Zen 2/3 cores, making it a loss on short strings (but until I measured it on our server hw, M4 Max sure had me fooled, it has way better latency despite retiring more AES operations!).
xkcd1963
Question, what do you do if there is a collision? I saw the github table also mentioned collisions
lights0123
They're extremely common in hashtables. You follow standard open addressing or separate chaining procedures: https://en.m.wikipedia.org/wiki/Hash_collision
saagarjha
Fall back on secondary hashes or do probing
jandrewrogers
Without getting too deep in the technical weeds, I will assert that rapidhash is an excellent small-key hash function and approximately the state-of-the-art for that purpose. There is nothing else better that I am aware of in this scope.
For bulk hashing there are better functions, but XXH3, GXHash, etc are not among them, being both slower and weaker than the state-of-the-art. Hash functions that can’t pass quality tests are not serious contenders.
curiouscoding
I'd love to see some benchmarks/comparison on variable length strings. For strings with random length between 10 and 30, gxhash was significantly faster than xxhash for me. I would assume because it processes chunks of up to 32 chars at a time, avoiding branch misses.
Generally my feeling is that at these speeds, designing for branch-predictability for short strings might be more important than absolute throughput.
rurban
It's not the fastest on smhasher anymore. umash is better and faster, and gxhash is twice as fast. It's is however very hashmap friendly because of its small size. It's a wyhash variant
nicoshev11
Hey Reini, the rapidhash version on the SMHasher repo is outdated. The latest version was published last Tuesday, I'll submit it to SMHasher shortly. This new version is much faster than the previous one, and still passes all tests. It should beat umash on your benchmarks.
Sesse__
Am I right in that RAPIDHASH_UNROLLED is now mandatory and the function is thus much larger now?
null
wpollock
Is rapidhash cache-friendly?
wmf
Hash functions don't have any data reuse so none of them are cache-friendly.
benwills
I think they may be asking about the CPU cache.
Dylan16807
You have to go out of your way to make a hash that doesn't fit into L1, so again they're all basically the same.
You'll probably end up fitting entirely inside the reorder buffer plus a sequential stream from memory, with the actual caches almost irrelevant.
bandrami
I saw the first four words and then was disappointed by the direction it took...
At some point, hashes are fast enough. I'm not going to be switching from xxhash to shave 40 picoseconds off of hashing small keys, and for large keys it's already a few orders of magnitude faster than my NVMe drive.