Index 1.6B Keys with Automata and Rust (2015)

scrubs

A darn good write up! It's clarity is refreshing. Well well done. Thanks for posting.

stuhood

An interesting exercise would be to compare this with the (confusingly similarly named) `fsst` string compression strategy: https://github.com/cwida/fsst

VivaTechnics

Impressive! This approach can be applied to designing a NoSQL database. The flow could probably look something like this? Right?

- The client queries for "alice123". - The Query Engine checks the FST Index for an exact or prefix match. - The FST Index returns a pointer to the location in Data Storage. - Data Storage retrieves and returns the full document to the Query Engine.

yazaddaruvala

What you’ve described is the foundation of Lucene and as such the foundation of Elastic Search.

FSTs are “expensive” to re-optimize and so it’s typically done “without writes”. So the database would need some workaround for that low write throughput.

To save you the time thinking about it: The only extra parts you’re missing are what Lucene calls segments and merge operations. Those decisions obviously have some tradeoffs (in Lucene’s case the tradeoff is CRUD).

There are easily another 100 ways to be creative in these tradeoffs depending on your specific need. However, I wouldn’t be surprised if the super majority of databases’ indexing implementations are roughly similar.

sa-code

Lucene's WFST is an insanely good and underappreciated in-process key value store. Assuming that you're okay with a 1 hour lag on your data.

Keyvi is also interesting in this regard

yorwba

That wouldn't be a good idea in most cases due to reasons laid out in the "Not a general-purpose data structure" section. https://burntsushi.net/transducers/#not-a-general-purpose-da...

HN

Index 1.6B Keys with Automata and Rust (2015)

Index 1.6B Keys with Automata and Rust (2015)