New high-quality hash measures 71GB/s on M4

56 comments

·May 14, 2025

jandrewrogers

Without getting too deep in the technical weeds, I will assert that rapidhash is an excellent small-key hash function and approximately the state-of-the-art for that purpose. There is nothing else better that I am aware of in this scope.

For bulk hashing there are better functions, but XXH3, GXHash, etc are not among them, being both slower and weaker than the state-of-the-art. Hash functions that can’t pass quality tests are not serious contenders.

spiffytech

Can you share more about the quality tests XXH3 fails? I don't know this space deeply, so I'm having trouble reconciling why XXH3 is popular if it fails tests.

jandrewrogers

The quality measure for a hash function is the probability that it is a random oracle for any given set of hash keys. Test suites will test tens of thousands of key sets (and billions of keys) with different properties. The distribution of probabilities across those tests can then be compared to the same distribution from a random oracle instead of a hash function.

The best cryptographic hash functions, like SHA-256, are very close to a random oracle. Weaker cryptographic hash functions, like MD5, are significantly further away and stronger non-cryptographic hash functions, like rapidhash, are further yet. Most non-cryptographic hash functions are so far away from a random oracle in some of the tests as to be effectively broken. XXH3 is an example of this, anyone can try it for themselves using e.g. SMHasher3. For these “broken” cases, the hashes quite obviously don’t have a random distribution.

There are a couple important points to take away from this:

Hash quality is a function of the key set. If you know the properties of the key set ahead of time, which is often the case, then you can construct a hash function with narrowly optimal quality and performance even though it is totally broken for many other key sets that are out of scope. These almost always perform better than a general purpose hash function. This is a good performance engineering trick.

For a hash function to claim suitability for “general purpose”, it needs to be consistently close to a random oracle across all key sizes and key distributions. If it breaks for one key set, it is broken for many others that you didn’t test. Being closer to a random oracle across tens of thousands of tests does not imply that the hash function is not broken (even SHA-256 may be broken) but it greatly improves the probability that you will never accidentally generate a key set that creates a severely biased hash distribution.

Most non-cryptographic hash functions sacrifice quality to improve latency. Low latency and small I-cache footprint is very important for small-key hashing performance in real systems. The Pareto frontier of quality+latency is still quite an active area of research. Any hash function that was designed 10 years ago will be very far from that frontier, the state-of-the-art is fast-moving.

There is a similar Pareto frontier for quality+throughput e.g. hashes used as block storage checksums.

As for why XXH3 is popular, it is incessant self-promotion and marketing. I recently went down a rabbit hole of analyzing quality measures for many hash functions to see how far the state-of-the-art is from the cryptographic hash functions, so the data is pretty fresh. The great news is that non-cryptographic hash quality is now qualitatively far beyond what it was even a few years ago while at the same time improving performance.

spiffytech

I'd love to read your findings if you've written them up.

neonsunset

Same here. And I’m not aware of quality issues with GxHash either. For large(r) inputs I’d expect it to be easily competitive. And it’s better than XXH3 on my M4.

kragen

I feel like I'm missing an enormous amount of context here, and I can't be the only one.

How much better is it than old standbys like hashpjw, FNV1A, or djb's favorite¹, and how? It's apparently 350 lines of code instead of 1–10 lines of code. That seems like an important way that it's worse. And it comes with a copyright license that requires you to add it to your documentation, so the risk of forgetting that creates a risk of copyright infringement. So, how big is the benefit?

How much faster is it hashing a small key like the string "__umulh"? Is it actually slower? (For those needing an introduction to hash tables, covering why that question makes sense, Chris Wellons is always excellent: https://nullprogram.com/blog/2025/01/19/)

Why does it have an array commented "default secret parameters" in the public GitHub repo? Are we supposed to change those? Do they provide some kind of security, and if so, what kind, against what threat model?

The readme and https://scholar.google.com/scholar?hl=es&as_sdt=0%2C5&q=rapi... are no help here. https://github.com/rurban/smhasher?tab=readme-ov-file#summar... explains some of the questions at greater length but of course not rapidhash's particular answers.

______

¹ while (s[i]) h = h * 33 + s[i++]; about 11 machine-code instructions in its full-fledged form: https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(filename...

jandrewrogers

The rapidhash algorithm is notable for being general purpose, with unusually low latency relative to its quality metrics.

Classic algorithms are severely broken for many trivial key sets and therefore are not general purpose. However, they may work great for your specific key set and this is something you can test for if you know the properties of your key set. The point of a general purpose hash function is that you can use it with unknown or unknowable key distributions with some degree of assurance that the distribution of hashes is unlikely to be biased in some way.

Something else to keep in mind is that those old algorithms are really slow. Modern algorithms explicitly take advantage of the parallelism and width of modern CPU cores. Larger internal hash state allows higher throughput and quality but also increases latency and I-cache footprint, so there is a tension in the design.

As I mentioned in another post, if the properties of your key set are well-defined and bounded, it can often make sense to design a hash function that is only high-performance and high-quality for that key set. Most people don’t know how to do this effectively so they mostly stick with general-purpose hash functions, one latency-optimized and one throughput-optimized.

ajross

> How much better

That's the real rub. Hashed data structures were a revelation in the early 90's when we all started using perl. None of this work really changes any of that calculus. It's not like we're all going to be chatting at the water cooler about "Remember the days when all we had was djb2?! How the world has changed!"

Still interesting, but unless you're directly involved in a heavily benchmarked runtime, it's probably not worth trying to integrate.

cb321

I'm not sure I can answer your other questions like copyrights & secrets, but the "how is it faster?" is actually pretty easy and worth knowing/saying to anyone unaware - the hash you present is "1 byte at a time". All fast modern functions competing in this space try to do at least 8 bytes at a time, but perhaps a whole 64B cache line. They do tend to have high "startup/finalize" overheads, though which can matter (can easily be dominant) for use in hash tables with modest key lengths instead of hashing large files.

The rest of this comment is kind of ELI5 (I hope!) not because I think @kragen needs that, but because maybe other readers will benefit and I know he leans towards long form text, and @jandrewrogers basically alludes to this stuff in his comment opening this sub-thread. EDIT: as do a few other posters here like at least @curiouscoding https://news.ycombinator.com/item?id=44012740

______

As for benefit - that always depends on "Compared to what??" in your use cases, what CPU you expect things to run on, and so much else. As just a few examples - last I knew, the SMHasher performance tests blocked inlining which already makes them unrealistic comparisons, esp. with really tiny code footprint ones like your FNV1a example. Your higher level table structure might also be caching the hash code for table resize/as a comparison prefix for lookups. So, how much you have to hash at all may depend on doing that, predictability of table sizes, etc. How effective the comparison prefix idea even is can depend upon how common long prefixes are in your key sets which my be literally unknown - could be pathnames from `find` with tons of prefix similarity or none at all.

Furthermore, the entire culture of presentation of performance in this space is sadly "rate" - the reciprocal of what is easi(er) to reason about, namely "time". Why? "Time is additive". Even the headline here is "GB/s". Cycles/byte or ns/byte is just easier. You want to present a time equation with perHashCost + perByteCost*nBytes and both unknown coefficients if inferred from real-times on a modern time sharing OS probably have significant statistical variation. If there is a highly non-random distribution of kicks/competition/interferences, it will usually be easier to see in time kicks. The reciprocal transformation distorts everything. Anyway, a time cost equation (like Knuth v1-3 of days of yore!) also lets you easily compute the break-even string length when rate starts to dominate fixed overheads, namely nBytes_breakEven = perHashCost/perByteCost.

It also lets you estimate an answer to your question of "when does this matter", if your strings are typically under 32B (I believe an early days limit to identifiers for many C compilers, but someone can correct me if that's wrong) and the perByteCost is 0.1 ns/byte (10 GB/s - just a round number), then break even (when per-hash starts to matters more) is 0.1*32 = 3.2 ns or 16..24 cpu cycles on most modern deployments. Said the other way, if your perHash overhead is 16..24 cycles and your hash througput is 10 GB/s, then you better hope your strings are longer than 32 bytes on average or you've probably made a bad choice.

One way in which an equation fails is that it does oversimplify the performance profile even of this simple problem. That profile often has plateaus/cliffs at cache line size and integer size boundaries, but rates fare no better there. Even so, if you are presenting a bunch of hashes, a table sortable by perHash and perByte costs would be the best starting point for most evaluations where the slope is "just an average"/effective value. A plot is best, though.

kragen

I wasn't asking how it could be faster; I was asking how much faster it was (or otherwise better), for small keys like "__umulh", where the startup/finalize overheads are worst. Clearly GB/s is a good measure if you're hashing files or Flash FTL blocks, but jandrewrogers specifically said it's state of the art for the purpose of small-key hashing, which to me means things like dictionary words and variable or method names. Maybe that's not what they meant?

Even on a fairly wimpy microcontroller the hash I linked above only has about four instructions of startup/finalize overhead, even without inlining: https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(filename... (I modified it slightly because GCC was puzzlingly not reducing my array indexing to pointer arithmetic...)

ozgrakkurt

Xxh3 was miles better than anything else I tested for probabilistic filters, it is high quality and very reliable.

what is a hash fn that is more reliable and faster than xxh3?

null

[deleted]

ozgrakkurt

Key size is only one parameter, you have 5 other parameters, it doesn’t mean much to say a hash fn is good for small keys

aidenn0

At some point, hashes are fast enough. I'm not going to be switching from xxhash to shave 40 picoseconds off of hashing small keys, and for large keys it's already a few orders of magnitude faster than my NVMe drive.

jandrewrogers

Quality matters. XXH3 fails something like 15% of the tests in SMHasher3. There are faster hash functions with much better quality profiles.

leumassuehtam

Since you've mentioned, which hash functions are better while being faster than XXH3?

jandrewrogers

The list of non-cryptographic hash functions that aren’t currently known to be broken for general purpose hashing is pretty short. There are three dimensions to performance: latency, throughput, and microarchitecture. Algorithm designs explicitly tradeoff older microarchitecture performance for newer microarchitecture performance, depending on if the objective of portability is compatibility or consistency.

For latency-optimized hashing, rapidhash is currently the fastest algorithm I know of with consistently good hash quality. Hash quality is outside the cryptographic range, which is expected for a latency-optimized design, but close enough that you can at least make a comparison. Should be pretty portable too. It is what I recommend for general-purpose small-key hashing.

For throughput-optimized hashing, rotohash[0] is currently the fastest algorithm I know of, with single-core throughput saturating CPU cache bandwidth. Hash quality is at the low-end of the cryptographic range, about the same as MD5. While it is built on portable 32-bit primitives, it clearly targets modern microarchitectures — the Zen 5 performance metrics, though not listed, are crazy. This is brand new, part of a research project to efficiently and robustly checksum I/O at current hardware line rates (100s of GB/s).

Any algorithm can be faster if you completely disregard quality. In fairness, the quality bar has risen quickly as we’ve become better at designing these algorithms and finding the defects. State-of-the-art algorithms a decade ago are all considered obviously and hopelessly broken today. Designing a competitive hash function now requires an extremely high degree of skill and expertise. It was much easier to come into it as a hobbyist and produce an acceptable result ten years ago.

[0] https://github.com/jandrewrogers/RotoHash

shpx

There aren't any hashes passing all tests that are faster than XXH3 for both short (1-32 byte) inputs and overall throughput from my reading of https://gitlab.com/fwojcik/smhasher3/-/tree/main/results#pas... . MeowHash, t1ha0.aesB, falkhash2 and falkhash1 have higher throughput and fail less tests, but they're slower on short inputs. wyhash is about the same on short inputs and fails less tests, but has half the throughput.

steeve

No, just because you’re hashing every ms doesn’t mean we all do.

neonsunset

If whatever you are doing is heavy on hashmap lookups (and you are ok with not rewriting into something more bespoke+complicated) - the faster hash function and the cheaper baseline cost of calling it - the better (i.e. XXH3 can have disadvantages, with its popular impl. for dispatching to the necessary routine).

This looks like an interesting potential alternative to GxHash. GxHash is nice but sadly I found AES intrinsics to have somewhat high latency on AMD's Zen 2/3 cores, making it a loss on short strings (but until I measured it on our server hw, M4 Max sure had me fooled, it has way better latency despite retiring more AES operations!).

xkcd1963

Question, what do you do if there is a collision? I saw the github table also mentioned collisions

lights0123

They're extremely common in hashtables. You follow standard open addressing or separate chaining procedures: https://en.m.wikipedia.org/wiki/Hash_collision

meindnoch

Open any data structures textbook and look for "hash map" in the table of contents.

saagarjha

Fall back on secondary hashes or do probing

rurban

It's not the fastest on smhasher anymore. umash is better and faster, and gxhash is twice as fast. It's is however very hashmap friendly because of its small size. It's a wyhash variant

nicoshev11

Hey Reini, the rapidhash version on the SMHasher repo is outdated. The latest version was published last Tuesday, I'll submit it to SMHasher shortly. This new version is much faster than the previous one, and still passes all tests. It should beat umash on your benchmarks.

Sesse__

Am I right in that RAPIDHASH_UNROLLED is now mandatory and the function is thus much larger now?

curiouscoding

I'd love to see some benchmarks/comparison on variable length strings. For strings with random length between 10 and 30, gxhash was significantly faster than xxhash for me. I would assume because it processes chunks of up to 32 chars at a time, avoiding branch misses.

Generally my feeling is that at these speeds, designing for branch-predictability for short strings might be more important than absolute throughput.

eqvinox

Why are the numbers in the "performance" section almost all for AArch64? Where are the AMD64 numbers?

RainyDayTmrw

Is this safe to use in public facing web servers? (Can adversaries trigger hash-DoS[1]?)

[1]: https://en.wikipedia.org/wiki/Collision_attack#Hash_flooding

zitoune

"The fastest hash function passing all tests in SMHasher."

GxHash is twice as fast and passes all SMHasher tests...

realo

At a few ns per hash, it means the hash is done before photons from your screen reach your eyes.

Impressive!

CyberDildonics

There is a big difference between throughput and latency. If you do 8 at once with SIMD it doesn't mean you can get a single one done in 1/8th the time.

null

[deleted]

mertleee

What's the actual application for this?

wpollock

Is rapidhash cache-friendly?

wmf

Hash functions don't have any data reuse so none of them are cache-friendly.

Retr0id

You can have cache-unfriendly hash functions, though.

For example, a common optimisation for CRC implementation involves lookup tables. If your lookup table exceeds L1 then you're going to have a bad time. If your lookup table fits exactly in L1, your benchmarks might look great, while real-world performance suffers (because your other code wants to be making use of L1, too).

I'd imagine rapidhash avoids lookup tables but the question of cache-friendliness is still pertinent.

benwills

I think they may be asking about the CPU cache.

Dylan16807

You have to go out of your way to make a hash that doesn't fit into L1, so again they're all basically the same.

You'll probably end up fitting entirely inside the reorder buffer plus a sequential stream from memory, with the actual caches almost irrelevant.

bandrami

I saw the first four words and then was disappointed by the direction it took...

null

[deleted]

coolcase

Or first 2 words

HN

New high-quality hash measures 71GB/s on M4

New high-quality hash measures 71GB/s on M4