Elastic Binary Trees (2011)
11 comments
·February 5, 2025numba888
mmiao
But we need extra computation to move forward/backward, which is not suitable for the problem
null
numba888
one xor, yes, but on most systems smaller memory footprint is better. there are many more nodes than iterators. so savings can be significant.
another possible optimization is when 'key' fits in pointer or two. which is usually the case.
wonder if r1 or 03 can do it, with the right instructions of course.
numba888
> wonder if r1 or 03 can do it, with the right instructions of course.
for o3-mini-high it's too difficult to write even simple ebt. not many implementations on the web to learn from(?) reasoning isn't strong enough for this. besides what it produced is very inefficient.
o11c
For the problem this was designed for, what this is missing is an argument for why this is better than a heap with a last-resort ordering trivially added. (If you're not using this for a scheduler of course a heap doesn't make sense.)
Note that C padding can be unfortunate here; for some types (but not when `sizeof(key) == sizeof(pointer)`) it may make sense to pack `bit` (in the generic struct; this does not need a whole `int` even) and `key` (in the type-specific struct) into the same word. I think `bit` needs `bitsizeof(key) + bitsizeof(max_dups)` values, so you only need 7 bits (log2(64+64)) at most (hmm, that's almost small enough to try cramming into the low bits of the pointers again ...).
I forget if pointer-bit stuffing is actually defined in C, or if you should be using `uintptr_t`. Debugging is probably saner if you do regardless.
It appears that this is intended to be used intrusively (in which case the value also needs to think about padding), but this completely fails to use the appropriate vocabulary. In fact, I think that might be the only difference between this and a typical bit-oriented radix trie? If so, Wikipedia is completely correct to delete this, since in general any node-based container can be made intrusive quite trivially.
Using absolute indexes (against a known origin) is probably saner than relative-pointers if you have something simple enough, since it lacks the problem of moving/copying.
Despite those considerations, this is definitely worth reading if you're interested in data structures (unlike some articles that get posted occasionally, which can be blogspam or outright incorrect).
rdtsc
Everyone knows probably and the post alludes to it, but the author is the creator and maintainer of HAProxy.
The implementation is there https://github.com/haproxy/haproxy/blob/master/include/impor...
thomasmg
> EB trees heavily rely on bit-addressable data (typically integers)
That seems like a disavantage, but actually most data is bit-addressable in one way or the other; even strings, if you look at the UTF-8 representation.
What I'm a bit worried is the number of levels: it seems to be 1 level per bit in the key. And "a tree node is typically 20-bytes long on a 32-bit system". So that is 20 bytes per bit in the key. I wonder if the memory usage could be reduced in a simple way. For example using larger "pages" that contain arrays.
dang
Related. Others?
Ebtrees: Elastic Binary Trees (2011) - https://news.ycombinator.com/item?id=20007752 - May 2019 (8 comments)
Looks like author never heard about single pointer double linked list. That could save him 2 pointers per each node.
The idea: for simplicity taking list. single pointer saved in the node is xor of next and previous nodes. When moving forward we xor current ptr with prev. Moving backward xor with next to get prev. That it. How do we get prev/next for xor? The only way to get in the middle of the list is using iterator. Which can hold one extra pointer. For the first/last nodes extra pointer is NULL.
The same for EB tree, as navigation can be done only using iterator it can hold extra pointer.
These are my $0.02, you are free to implement and use this algorithm. No fees, no references required.