Shift-to-Middle Array: A Faster Alternative to Std:Deque?

120 comments

·March 23, 2025

ksherlock

A couple notes looking at the c++ implementation

- this is going to have problems with non-trivial types. (Think about destructors or move constructors like std::unique_ptr). If you don't want to deal with them, at least add a static_assert(std::is_trivially_copyable<T>::value == true);

- front() doesn't return a reference and it doesn't even return the front

- adding iterators (begin()/end()) will let it play nice with for( : ) loops and <algorithms>, etc.

usefulcat

Note that this implementation (I looked at the c++ code) will repeatedly double the amount of allocated memory, even if the usage of the queue is such that it never contains more than one item at a time. It's not much different from a memory leak.

    void insert_head(const T& value) {
        if (head == 0) resize(); // <= resize() will double the allocated memory
        data[--head] = value;
    }

It looks like there is a fix for this behavior in resize() (to avoid repeated reallocation when the queue size is small relative to capacity), but it is currently commented out..

severino

A little off-topic, but is it usual in C++ to have a header (.h) and a source (.cpp) where the attributes and most of the methods are identical in both files, but with some more methods in the header file?

actionfromafar

Well the code should not be duplicated, only method signatures, but yes.

It’s very common.

Edit: after a while you don't even think about it (and of course, there are reasons for it) but sometimes I pause and think. It didn't have to be this way. Some C++ libraries are what's called "header only" which makes them very easy to integrate into your own code. Downside is that it may take longer to compiler your code. (And here lies a clue to why things are that way. The header tells the compiler how your other code may interface with the code, without having to know exactly what goes on in the code.) There have been attempts¹ to do away with the split between header and code, which would make C++ a bit more like C# or Java for instance in that respect. In newer versions of C++ there is the "module"² concept which I don't know much about but which can achieve something similar.

1: https://sourceforge.net/projects/lazycplusplus/

2: https://en.cppreference.com/w/cpp/language/modules

bogwog

Modules 100% solve this problem, but broad compiler support is still (frustratingly) lacking.

Some day, a class like in the OP will be implementable in a single file. The compiler will compile that once, and users can 'import' it infinitely without worrying about the usual header inclusion pitfalls, and without incurring any compile time overhead. The amount of electricity saved from the reduced compilation work will save us from the current climate disaster, and we'll become a maximally productive society unburdened by slow C++ compilers.

severino

Thanks, what you explain in your comment is the idea I had, too, although I've little experience in C++. But I was confused after taking a look at this project's source and seeing all the duplicated code between ShiftToMiddleArray.h and ShiftToMiddleArray.cpp, and not only signatures. I wasn't sure if that was done for some purpose.

AttilaT

I recently developed a new data structure called the Shift-To-Middle Array, designed as an alternative to std::deque, std::vector, and linked lists. My goal was to optimize insertion and deletion at both ends, while also improving cache locality and performance compared to traditional implementations.

What is the Shift-To-Middle Array? Unlike std::deque, which uses a fragmented block-based structure, the Shift-To-Middle Array maintains a contiguous memory layout. Instead of shifting elements inefficiently (like std::vector), it dynamically redistributes free space toward the middle, reducing unnecessary data movement.

Key Features: Fast insertions & deletions at both ends (amortized O(1)) Efficient cache utilization (better than linked lists) Supports fast random access (O(1)) No pointer chasing (unlike linked lists) Parallelization & SIMD optimizations possible

Performance Benchmarks I benchmarked Shift-To-Middle Array vs. std::deque vs. ExpandingRingBuffer vs. std::queue across different workloads. Some highlights:

Push-heavy workload → Shift-To-Middle Array showed improved insertion performance over std::deque.

Pop-heavy workload → Showed improvements in memory access and removal operations.

Random insert/remove workloads → Demonstrated better cache efficiency compared to linked lists.

(Full benchmarks and source code available below.)

When Should You Use It? High-performance queue-like structures

Game engines (handling real-time events efficiently)

Networking applications (handling packet buffers)

Dynamic sequences (e.g., computational geometry, physics sims)

Would love to hear thoughts and feedback from the community! Have you encountered similar performance bottlenecks with std::deque or other dynamic structures?

pclmulqdq

The C++ std::deque usually uses blocks that are big enough not to worry about memory concerns. The linked list has bad performance because you have tiny blocks of memory, but blocks of 256 objects are usually big enough that they are contiguous for all practical purposes (paging, allocation, and caching). Libc++, the library associated with clang and the LLVM project, uses a block of 4096 objects in its deque.

This is an alternative that has been shown in the literature many times before, and it works well for certain access patterns, but is a major waste of resources for others. Yours in particular is great when you are pushing/popping both sides equally. The C++ standard deque is made for unknown but unequal directions of push/pop (while still having ~O(1) random access) with 50/50 ratio of push and pop.

scottlamb

> I recently developed a new data structure called the Shift-To-Middle Array, designed as an alternative to std::deque, std::vector, and linked lists.

I notice you do not include ring buffers in this headline alternatives list. To me ring buffers seem the most natural comparison. I'd expect them to perform strictly better (no movement, no amortized constant time). But parsing APIs often can't handle the discontinuity where they wrap around, so I think this or something like it has value.

I do see you included this `ExpandingRingBuffer` in your benchmarks, and wrote the following:

> ExpandingRingBuffer performs well for small to medium container sizes but becomes less efficient for larger sizes, where Shift-To-Middle Array and std::deque maintain better performance.

Why do you think ExpandingRingBuffer's performance suffers? Is this about frequent expansion? Otherwise, as mentioned above, I'd expect a well-implemented ring buffer to be hard to beat.

evjohnst

This is pretty cool, thanks for sharing!

It looks like your benchmarks only cover average operation time. Have you thought about benchmarking p99 latency as well? I would expect insertions that cause reallocations to be slow enough that it might be an issue for some usecases.

MattPalmer1086

At the very least, computing the standard deviation as well as the mean average would be useful to get some idea of the variation.

The median, rather than the mean would also generally be a better guide to performance in practice. Even better, give the mean, median and std deviation.

dehrmann

> Unlike std::deque, which uses a fragmented block-based structure

I always assumed deque implementations were ring buffers that double in size once full so that prepend/append operations are amortized O(1).

tialaramex

No, that's what Rust's VecDeque is, the C++ std::deque is something which you wouldn't invent today, but alas C++ is wedded to how things were done in the 1990s sometimes before.

std::deque does have practical uses, but they're rare and many implementations aren't well suited even to those uses. Unlike VecDeque most people should just ignore it.

gpderetta

> the C++ std::deque is something which you wouldn't invent today,

Why is that?

My major issue with std::deque is that the standard implementation doesn't provide a) a way to define the block size, b) a way to access the single blocks for optimizations. The deque in boost.container provides the former. I don't know if any widely available implementation provides the latter (segmented iterators were first proposed around the original C++ standardization, but it seems that were never picked up).

TeMPOraL

IDK, I thought the whole point of a deque was to string vectors into a linked list, so you get benefits of both, the most important ones being 1) cheap random access, and 2) insertion doesn't move stuff around in memory. I.e. that the deque's "vectors connected by pointers" is not an implementation detail, but the very nature of the beast.

Maybe I just took the boxes-and-arrows diagrams from C++ books too seriously.

widdershins

Yes, I agree, insertion not moving things is a very useful feature of deques. It allows you to keep items with deleted copy and move constructors in a container that has good cache locality.

amluto

deque does not move elements once added, so it can’t be implemented like that.

rowanG077

Is that an actual guarantee or just the current implementation? I'd expected that at the very least some functions that can operate on std::deque must require moves. erase_if immediately comes to mind.

acmj

std::deque typically uses chunked arrays. It is more complex but tends to be faster than a ring buffer based implementation.

mythmon_

That's what I learned in my university data structures course. I don't know anything about C++'s std::deque though.

MattPalmer1086

Really interesting, I love new ideas for data structures.

One little note about benchmarking though. It's hard to get good benchmark data, particularly in Java due to the JIT compiler. At the very least, you should perform a large number of warmups (e.g. 10000 calls to the code) before actually benchmarking to ensure all code is fully compiled by the JIT. That's only one of the gotchas though. Even better, use a dedicated Java benchmarking system like jmh.

gpderetta

There is no JIT in a typical C++ implementation. The first run might be an outlier due to warming up multiple caches, but it can be obviated with a large enough run. Runtime should be relatively stable for synthetic benchmarks.

MattPalmer1086

There is a Java implementation as well with benchmarks. But yes, C++ has no JIT, although doing some warm up is also advised for benchmarking in general.

threeducks

> I recently developed a new data structure

Congratulations, you have discovered the array deque!

https://en.wikipedia.org/wiki/Double-ended_queue#Implementat...

I'd like to point out a performance optimization. On resize, you create a new array with double the size and copy all the old elements to the middle of the new array with more space at the beginning AND at the end: https://github.com/attilatorda/Shift-To-Middle_Array/blob/05...

However, in practice, it is often the case that most operations append elements at either the front OR back, but rarely at both ends equally. For example, imagine a FIFO queue, where elements are popped from the front and pushed from the back. Therefore, it is more efficient to only reserve space at the back if the last operation was push_back or at the front if the last operation was push_front.

One could also carry along statistical information about the number of push_back and push_front operations and balance the space allocated at the front and back accordingly.

In addition to ksherlock's points, there are also the following issues:

- Vector implementations usually use size_t instead of int. Your implementation will fail for arrays larger than 2147483647, while size_t usually goes up to 18446744073709551615.

- On memory allocation failure, you should throw std::bad_alloc instead of calling std::exit.

- The front() function is missing a return statement. Turn on compiler warnings: -Wall -Wextra. For testing, -g -fsanitize=address,undefined is also helpful.

- You are mixing delete[] with malloc'ed memory in shrink_to_fit.

- You should implement copy constructor, move constructor, copy asignment and move assignment functions. See rule of five: https://en.cppreference.com/w/cpp/language/rule_of_three#Rul...

- Switching features on or off is usually done with macros instead of comments.

- 2 is not the best growth factor because it makes it harder for the memory allocator to reuse memory. More modern implementations use smaller growth factors: https://en.wikipedia.org/wiki/Dynamic_array#Growth_factor

- A more reasonable initial capacity would be 0 instead of 16, as is common for all major std::vector implementations. An initial capacity of 16 wastes a lot of space when a large number of ShiftToMiddleArrays are allocated.

- The compiler will most-likely ignore your inline instructions and decide on its own whether inlining is done or not, so might as well remove them.

- Why are there two versions of the data structure? (ShiftToMiddleArray.cpp and ShiftToMiddleArray.h)

- Some functions are just wrappers for other functions (pop_front = remove_head, pop_back = remove_tail). Why?

LLMs can point out most of those issue, so you should use them to discover potential issues. Of course, make sure to double-check with more reliable sources once you know that a certain class of problems exists.

fearthetelomere

>- 2 is not the best growth factor because it makes it harder for the memory allocator to reuse memory. More modern implementations use smaller growth factors: https://en.wikipedia.org/wiki/Dynamic_array#Growth_factor

I've often wondered about this, so I'm curious to learn more. I agree in principle we should be more clever to ensure we have better memory use. However, half the implementations in the list you've linked use a growth factor of 2, so I'm confused about your point.

If it's not the best, what is? Do you know why these implementations opt for 2 if it's not the best choice?

SkiFire13

The better memory usage argument is not as one sided as it may appear. The issue related to using a growth factor of 2 appear when you have a single very big array, but for many smaller ones it's not really an issue. Meanwhile allocators generally tend to like power of 2 sizes, and non-2 growth factors produce non-power of 2 sizes. Some data structures are also easier to implement when the size is a power of 2 because you can efficiently wrap around by using bitwise operations rather than an expensive remainer.

threeducks

> Do you know why these implementations opt for 2 if it's not the best choice?

Probably a mix of simplicity, not knowing or not caring. Most software is not optimal.

> If it's not the best, what is?

In theory, the best value is a bit less than the golden ratio, so 1.5 is quite good.

https://archive.li/Z2R8w#selection-119.7-135.119

In practice, unknown factors can influence the result, so it is best to benchmark your code and try a bunch of values until you find the fastest configuration.

qwe----3

- The compiler will most-likely ignore your inline instructions and decide on its own whether inlining is done or not, so might as well remove them.

There are compiler specific attributes that can really force this. Of course, it's worth doing benchmarks and looking at the generated assembly to see if this is necesary

ntonozzi

Sounds very cool! How do you implement efficient random deletes?

oasisaimlessly

std::vector doesn't support efficient random deletes, so this (a generalization of std::vector along one specific axis) won't either.

Efficient random deletes and contiguous hole-free storage are completely at odds with each other.

sensen7

They aren't: https://github.com/GossiperLoturot/densemap

eru

The current implementation doesn't.

You could add that functionality via 'tombstones': when you delete an element, you replace it with a 'tombstone' marker in the structure.

Whenever you clean up the structure (eg for a resize), you skip the tombstones when you copy the old contents over.

dzaima

That'd break arbitrary reads being contiguous O(1) though.

bobmcnamara

There's a similar data structure sometimes seen in video games that does this, but the name escapes me.

pklausler

This is like the opposite of a gap buffer.

yencabulator

A degap buffer.

queue : deque :: gap : degap

(Use a ringbuffer, instead.)

rwbt

AFAIK, Apple's CoreFoundation CFArray also works similarly[0]. NSMutableArray works little differently (using a circular buffer). From the always excellent Cichenowski[1].

[0] - https://github.com/opensource-apple/CF/blob/master/CFArray.c [1] - https://ciechanow.ski/exposing-nsmutablearray/

pizlonator

JavaScriptCore uses this technique for JavaScript arrays that are used as deques, like if you unshift/shift.

Here's the core data structure: https://github.com/WebKit/WebKit/blob/main/Source/JavaScript...

The logic that makes it work is in JSArray.cpp and other files in that directory.

Shift-to-middle is surprisingly performant and also surprisingly hard to get right.

theamk

I don't see why I would use it over `std::deque`. It has all the same complexity properties, but better tested, included in stdlib, and supports complex objects. It even has OK cache locality, given it allocates data in large-ish blocks.

You should really include this in you summary table, because it'd have all the same values compared to shift-to-middle array.

And your benchmark confirm this: figure 3 does not show the raw data, but it looks like std::queue may be 8-10% slower on smaller data sizes, and 1-2% slower on larger data sizes. Such small and inconsistent differences do not indicate different O()-complexity, and likely very dependent on specific benchmark design.

Related: std::deque implementation details for various compilers: https://devblogs.microsoft.com/oldnewthing/20230810-00/?p=10...

jzwinck

You say deque uses large-ish blocks but you provide documentation that it uses 512 byte blocks on GCC and MSVC is even worse. So if you're on Windows the blocks are so small the container degenerates into something like a std::list, and on non-Windows it only works well if your objects are a few bytes each.

theamk

The gcc's 512 byte block is actually not that bad.

For cache locality, you want block size that is bigger that cache line size - and those are 64 to 128 byte range. And as for overhead, it seems like 1 pointer per block, or 1.5% for deque of pointers/integers - not a very big value. So yeah, while linked lists are bad, gcc's deque is pretty OK.

For MSVC, I agree, their std::deque is pretty bad. But the title of the post isn't "a faster deque MSVC", it makes no distinction between OSes.

pcwalton

Interesting alternative idea I thought of just now: a data structure that works like VecDeque (a circular buffer) but uses mmap to map two views onto the same pages right after one another. That would ensure that the entire array can be accessed in a consecutive fashion, no matter where it gets split, without any copying. The downside is that reallocation would be really slow, involving multiple syscalls, and the minimum size of the array would be 4kB or more, depending on the page size.

duped

I've seen this trick used around, where it really shines is when you want to prepare/commit a range of the ring buffer when interfacing with something that wants a contiguous chunk as an arg, using the mmap hack lets you pass any pointer into the ring buffer without needing to split it to handle the wraparound case.

There are a few blog posts out there about it, eg https://lo.calho.st/posts/black-magic-buffer/. One data structure that works around the limitations is the bip buffer: https://www.codeproject.com/Articles/3479/The-Bip-Buffer-The.... In that article the author talks about the mmap trick.

scottlamb

I don't think the bip buffer solves a real problem. Let's say I'm using it as a read buffer.

> The upshot of all of this is that on average, the buffer always has the maximal amount of free space available to be used, while not requiring any data copying or reallocation to free up space at the end of the buffer. ... Another possibility which was brought up in the bulletin board (and the person who brought it up shall remain nameless, if just because they... erm... are nameless) was that of just splitting the calls across wraps. Well, this is one way of working around the wrapping problem, but it has the unfortunate side-effect that as your buffer fills, the amount of free space which you pass out to any calls always decreases to 1 byte at the minimum - even if you've got another 128kb of free space at the beginning of your buffer, at the end of it, you're still going to have to deal with ever shrinking block sizes.

So it maximizes the contiguous free bytes. I feel like the author just never knew about readv? Passing a couple iovecs completely solves this problem in a much better way.

What seems far more valuable for the used space to be contiguous, as parsing APIs often expect this. bip buffers don't offer that, right?

Now let's say I'm using it as a write buffer. I've never had the problem of needing it to be contiguous on either side. On the input side, I could imagine some application API that really wants to write into a contiguous buffer, but it hasn't been my experience. On the output side, there's writev.

pcwalton

Oh, I see, it's actually on Wikipedia [1]. I figured I wasn't the first one to invent the idea :)

[1]: https://en.wikipedia.org/wiki/Circular_buffer#Optimization

o11c

For C++, that's only valid for some subset of types, which currently can't be expressed with type traits. "Address-free" has a close enough definition in the context of atomics.

Trivially moveable types are probably sufficient (at least, I can't construct a case where being trivially copyable is needed), but not necessary; there are many things a special member function can do without caring about the address.

In practice, the main problem is that you can't use private mappings (which are the default and for good reason); you have to use shared mapping, which are very finicky to set up and cause infelicities with `fork`. [This does make me wonder how reflinks/`copy_file_range` interact with `mmap` and the page cache.]

Really, you should just fix all your APIs to take an `iovec` array.

gpderetta

On linux you can use remap_file_pages[1] to setup your magic ring buffer. Yes, these days it is emulated in the kernel, but here we only care about setting up the expected duplicated mapping of anon pages instead of minimizing the allocations of VMAs.

Also linux has MADV_DONTFORK, would that allay your concerns? (there is a potential race between mmap and the madvice call, but if you are forking in a multithreaded program you have worse concerns).

[1] https://man7.org/linux/man-pages/man2/remap_file_pages.2.htm...

HeliumHydride

C++26 is getting a trait called "trivial relocatability", which allows you to communicate that a type can be memcpy'd around without fear.

eru

> In practice, the main problem is that you can't use private mappings (which are the default and for good reason); you have to use shared mapping, which are very finicky to set up and cause infelicities with `fork`. [This does make me wonder how reflinks/`copy_file_range` interact with `mmap` and the page cache.]

Yet another reason fork was never a good design choice.

fc417fc802

Is fork really the problem in this scenario?

yxhuvud

So essentially pushing the work to the TLB? Well, if there are people building moving GCs that manage to retain stable pointers that way, why not use it for a circular buffer as well.

> The downside is that reallocation would be really slow, involving multiple syscalls,

Scaling ring buffers up and down in size is not very performant anyhow, as a bunch of the elements in it tend to need to be copied.

gpderetta

This is normally called a magic ring buffer, and it is a relatively well known pattern. Very useful for queues when you need to support variable size elements and need to interoperate with code that can't handle split objects.

leiroigh

The main problem with that is that it doesn't play nice with most languages. Consider

  int foo(int* ptr) {
    int x = ptr[1<<16];
    *ptr += 1;
    return x + ptr[1<<16];
  }

Compilers/languages/specs tend to decide that `ptr` and `ptr + (1<<16)` cannot alias, and this can be compiled into e.g.

  foo(int*):
        mov     eax, dword ptr [rdi + 262144]
        inc     dword ptr [rdi]
        add     eax, eax
        ret

which gives undesired results if `ptr` and `ptr + (1<<16)` happen to be mapped to the same physical address. This is also pretty shit to debug/test -- some day, somebody will enable LTO for an easy performance win on release builds, and bad code with a security vuln gets shipped.

scottlamb

I don't think that's a fundamental problem. In say Rust (with its famously strict aliasing requirements), you obviously need some level of unsafe. You certainly want to ensure you don't hand out `&mut [T]` references that alias each other or any `&[T]` references according to either virtual or physical addresses, but that seems totally possible. I would represent the ring buffer with a raw pointer and length. Then for callers I'd construct `&[T]` and `&mut [T]` regions as needed that are never more than the full (unmirrored) length and thus never include the same byte twice. There are several existing Rust crates for the mirrored buffer that (though I haven't looked into their implementations recently to verify) presumably do this: slice-deque, vmcircbuf, magic-ring-buffer, vmap.

I do think though there are some downsides to this approach that may or may not be deal-breakers:

* Platform dependence. Each of the crates I mention has a fair bit of platform-specific `unsafe` code that only supports userspace on a few fixed OSs. They fundamentally can't work on microcontrollers with no MMU; I don't think WASM has this kind of flexibility either.

* Either setting up each buffer is a bit expensive (several system calls + faulting each page) or you have to do some free-listing on your own to mitigate. You can't just rely on the standard memory allocator to do it for you. Coincidentally just like last week I was saying freelisting is super easy for video frames where you have a nice bound on number of things in the list and a fixed size, but if you're freelisting these at the library level or something you might need to be more general.

* Buffer size constraints. Needs to be a multiple of the page size; some applications might want smaller buffers.

* Relatedly, extra TLB pressure, which is significant in many applications' performance. Not just because you have the same region mapped twice. Also that the buffer size constraints mentioned above make it likely you won't use huge pages, so on e.g. x86-64 you might use 4 KiB pages rather than 2 MiB (additional factor of 512x) or 1 GiB (additional factor of 262144x) as the memory allocator would help you do if they could be stuffed into the same huge page as other allocations.

dzaima

Rust doesn't help here; you necessarily must do all stores in potentially-mirrored memory as volatile (and possibly loads too), else you can have arbitrary spooky-action-at-a-distance issues, as, regardless of &[T] vs &mut [T] or whatever language-level aliasing features, if the compiler can see that two addresses are different (which they "definitely" are if the compiler, for one reason or another, knows that they're exactly 4096 bytes apart) it can arbitrarily reorder them, messing your ring buffer up. (and yes it can do so moving ops out of language-level lifetimes as the compiler knows that that should be safe)

vmcircbuf just exposes the mutable mirrored reference, resulting in [1] in release builds. Obvious issue, but, as my example never uses multiple references with overlapping lifetimes of any form, the issue would not be fixed by any form of more proper reference exposing; it's just simply the general issue of referencing to the same data in multiple ways.

vmap afaict only exposes push-back and pop-front for mutation, so unfortunately I think the distance to cross to achieve spooky action in practice is too far (need to do a whole lap around the buffer to write to the same byte twice; and critical methods aren't inlined so nothing to get the optimizer to mess with), but it still should technically be UB.

slice_deque has many open issues about unsoundness. magic-ring-buffer doesn't build on modern rust.

[1]: https://dzaima.github.io/paste/#0TVDBTsQgFLz3K56XbptsWlo1MWz...

dzaima

Volatile stores would fix that issue. But it does mean that it'd be unsafe to lend out mutable references to objects. (maybe you'd need to do volatile loads too, depending on model of volatility)

dzaima

The ExpandingRingBuffer.h is rather bad for a representation of a ring buffer - it uses modulo for the index calculation, which is pretty damn bad and should really at the very least be masking by a power-of-two.

(it might very-slightly-but-not-really be excusable if the reason was memory utilization.. ..but the resize does "size_t new_capacity = capacity * 2;" so it does doubling anyway. Also, see reply noting that you don't even need power-of-two sizes for fast wrapping, which I managed to completely forget about)

kragen

Most ring buffers that aren't powers of 2 in size can still get by with i == max ? 0 : i+1 and i ? i-1 : max. On most hardware these will be almost as fast as just i+1 and i-1, while division will be much slower, even on recent hardware.

dzaima

..yep, true, completely slipped my mind. So modulo is just trivially always the bad choice.

Even for arbitrary indexing "tmp=head+index; buffer[(tmp >= capacity) ? tmp - capacity : tmp]" or so is gonna be better. (assuming the compiler compiles it to something branchless. Or, probably even if it doesn't - division might end up slower than the worst-case of 50% misprediction! And for sequential indexing it's even gonna be predictable.)

jeffzha

How does this compare to boost::devector?

https://www.boost.org/doc/libs/develop/doc/html/container/no...

TylerGlaiel

This implementation grows indefinitely if you repeatedly push to the head and remove from the tail, even if the max number of elements in the array is small

IshKebab

Does it definitely do that? You could easily avoid it by making the "resize" really a move if you don't actually need more space.

I feel like they're over-selling it anyway by comparing to `std::deque` (which is not hard to beat). The only advantage this has over a standard ring buffer (like Rust's VecDeque) is that the data is completely contiguous, but you'll pay a small performance cost for that (regular memmove's when used as a queue), and I'm not sure how useful it is anyway.

null

[deleted]

bogdan-lab

The main benefit list gives you comparing to vector is a stable memory. This is often an important property. But lists are slow with an element access. This problem is solved by deque.

Therefore, I argue that alternative to deque has to have a stable memory property. Otherwise, you can just use a vector.

This implementation is trying to do so, btw, but for some reasons it operates with a raw memory under the hood instead of just holding a vector and rotating it here and there. Such approach is unnecessary complicated and error-prone

jtrueb

Since I often use a sliding window, I made a Rust implementation and was surprised with how performant the VecDeque implementation was.

The MidVec was only faster than VecDeque when doing batch inserts and removals with my implementation.

https://gist.github.com/trueb2/9c0a23aa012f56d4c3d50afe8acf6...

kragen

Interesting! This is the kind of thing I like.

I'm having a hard time understanding the description. If I understand right, it's kind of like an inside-out gap buffer, or a hybrid of a gap buffer and a ring buffer? Is the free space in the array always contiguous? If not, is the non-free space? How is it different from ExpandingRingBuffer?

Arnavion

It starts by adding the first element to the middle of the allocation and then the head and tail grow outwards as more elements are added at either end.

Once the head reaches the front of the allocation or the tail reaches the rear of the allocation, it triggers a resize. The resize creates a new allocation with double the size and copies the original elements to the middle of this new allocation.

kragen

That can't be correct, because then just adding elements at the front and removing them at the rear while maintaining a constant queue size such as 5 elements would trigger an infinite number of resizes. Maybe it only resizes under some circumstances, otherwise copying the live elements back to the middle?

It still seems like that involves copying that a straightforward ring buffer avoids.

akoboldfrying

I think the "resizes" occurring in this scenario would be to same-size buffers, meaning that the very same buffer could actually be reused. Provided that's in fact what is happening, then yes, this scenario would cause infinite resizes, all of which would be avoided by a ring buffer. But those resizes would still happen rarely enough to meet the amortised complexity claim, namely, 1 resize (copying of n elements) per n elements inserted.

(I'm not at all certain that this is how it actually does work -- the README is light on details. But this is how it might work.)

Arnavion

I read the code before I made my comment. It does exactly what I described.

HN

Shift-to-Middle Array: A Faster Alternative to Std:Deque?

Shift-to-Middle Array: A Faster Alternative to Std:Deque?