New high-quality hash measures 71GB/s on M4
16 comments
·May 14, 2025rurban
nicoshev11
Hey Reini, the rapidhash version on the SMHasher repo is outdated. The latest version was published last Tuesday, I'll submit it to SMHasher shortly. This new version is much faster than the previous one, and still passes all tests. It should beat umash on your benchmarks.
aidenn0
At some point, hashes are fast enough. I'm not going to be switching from xxhash to shave 40 picoseconds off of hashing small keys, and for large keys it's already a few orders of magnitude faster than my NVMe drive.
neonsunset
If whatever you are doing is heavy on hashmap lookups (and you are ok with not rewriting into something more bespoke+complicated) - the faster hash function and the cheaper baseline cost of calling it - the better (i.e. XXH3 can have disadvantages, with its popular impl. for dispatching to the necessary routine).
This looks like an interesting potential alternative to GxHash. GxHash is nice but sadly I found AES intrinsics to have somewhat high latency on AMD's Zen 2/3 cores, making it a loss on short strings (but until I measured it on our server hw, M4 Max sure had me fooled, it has way better latency despite retiring more AES operations!).
xkcd1963
Question, what do you do if there is a collision? I saw the github table also mentioned collisions
lights0123
They're extremely common in hashtables. You follow standard open addressing or separate chaining procedures: https://en.m.wikipedia.org/wiki/Hash_collision
saagarjha
Fall back on secondary hashes or do probing
null
wpollock
Is rapidhash cache-friendly?
wmf
Hash functions don't have any data reuse so none of them are cache-friendly.
benwills
I think they may be asking about the CPU cache.
Dylan16807
You have to go out of your way to make a hash that doesn't fit into L1, so again they're all basically the same.
You'll probably end up fitting entirely inside the reorder buffer plus a sequential stream from memory, with the actual caches almost irrelevant.
It's not the fastest on smhasher anymore. umash is better and faster, and gxhash is twice as fast. It's is however very hashmap friendly because of its small size. It's a wyhash variant