Writing a simple pool allocator in C

148 comments

·January 5, 2025

jll29

This code is very beautifully written, thanks for sharing.

However, you should consult the book "C: Interfaces and Implementations" by Dave Hanson, which has a library called "Arena" that is very similar to your "Pool"s, and it shows a few more tricks to make the code safer and easier to read (Chapters 5+6, 6 in particular).

D. Hanson's book (written in the 'literary programming' style invented by Donal Knuth) also demonstrates having debugging and production implementations for the same C API to catch memory errors, and his book is from 1997:

  > Copyright © 1997 by David R. Hanson

(He used to be a SE professor I Princeton before getting hired by Microsoft, I believe.)

cb321

Kernighan & Ritchie's "The C Programming Language" also has an example implementation of malloc() & free() { for any-size requests as @kragen notes in sibling } in it in like 3 pages (though, sure, it is far from the most efficient implementation). Since it's the last section of the main text, I'd guess that most readers don't get that far (even though it's a short book).

IgorPartola

Even if you don’t program in C, I think K&R should be required reading for anyone who writes software for a living. You can finish it in an afternoon, or two days if you actually write and run the code outlined in the book.

My bucket list includes an item to write a novel efficient implementation of malloc/free. I have taken several stabs at this in the past but have never been happy with it. I have no real need of a custom allocator but it is something that I think would be very cool to have under my belt. This was inspired by K&R.

kragen

I think you mean "literate programming".

The arena allocator in Hanson's chapter 6 is an arena allocator, not a "pool allocator" in the sense that 8dcc means, which is to say, a constant-time fixed-size-block allocator. It seems that you had the same reading comprehension problem that I did.

The difference is that Hanson's arena allocator can make allocations of any size (his Arena_alloc takes an nbytes argument) and cannot deallocate individual allocations, only the entire arena (his Arena_free takes only an arena pointer as a parameter). A fixed-size-block allocator like the one we are discussing here can only make allocations of a fixed size, but can recycle them individually, not only all at once.

(If Hanson's name sounds familiar, it might be because you've read the lcc book.)

It may be worth noting that Donald Knuth also invented the font used in 8dcc's article.

actionfromafar

Honestly, "literary programming" does better explain how Knuth writes code.

kragen

Yes, but that name lacks the benefit of implicitly describing other approaches to programming as "illiterate".

8dcc

Thank you so much! I will definitely check it out. I also planned on writing an article about arena allocators soon.

johnisgood

My minor nitpick regarding the code is the choice of variable names, but other than that, it is indeed easy to read and it is how I implement data structures and whatnot.

I like that there is a more or less exhaustive blog post about it which happens to look nice as well.

msla

Aside from the fact it persistently confuses linear time and constant time, it is good.

8dcc

You are right, sorry about that. I fixed it on the most recent commit (73526e0).

kazinator

If you write your own allocator in C, do yourself a favor and use the valgrind API inside it. Its use can be conditional so you can "compile it out". Or should I say it has to be, so you can build the code in an environment where there's no valgrind API.

tavianator

I've replaced Valgrind with sanitizers for most of my workflow. Using the sanitizer APIs in your custom allocator is also easy: https://github.com/tavianator/bfs/blob/main/src/sanity.h https://github.com/tavianator/bfs/blob/31dffd6441ba80ea998ce...

gopalv

+1 - this isn't even that hard, it is include a single header + annotate allocations/frees, with some redzones around the actual allocation to know when the user-code is writing into the pool pointer areas.

https://developers.redhat.com/articles/2022/03/23/use-valgri...

hooli_gan

https://en.wikipedia.org/wiki/Red_zone_(computing)

null

[deleted]

swah

Thats with Claude, right?

gopalv

> with Claude

The comment itself is AI generated or the code reference?

Here's some of my old code

https://github.com/php/pecl-caching-apc/blob/0dd2a69bab58822...

dsp_person

oof love on this page how scrolling hijacks the back button

lsllc

The Address Sanitizer (ASAN) in both clang and gcc is/are excellent:

https://github.com/google/sanitizers/wiki/addresssanitizer

I believe to integrate a custom allocator with ASAN, you need to look at the memory "poisoning" mechanisms:

https://github.com/google/sanitizers/wiki/AddressSanitizerMa...

pjmlp

It is also available on MVSC nowadays.

https://learn.microsoft.com/en-us/cpp/sanitizers/asan?view=m...

.NET team also makes use of it for the runtime parts,

https://devblogs.microsoft.com/cppblog/msvc-address-sanitize...

8dcc

I have never heard about this (using valgrind with a "non-standard allocator"), but it looks really interesting, specially for other projects I am working on. I will definitely look into it.

kazinator

I integrated the TXR Lisp garbage collector with valgrind. That came in handy on a number of occasions.

It's not always the case that you want special behavior for the garbage collector to assist with debugging, but also this: if you want to use Valgrind to debug anything that does not have to do with the garbage collector, you want all the false positives generated by it out of your way.

Some of the scanning that a conservative garbage collector has to do looks like access errors to Valgrind. By using the Valgrind API, you can grant yourself access to that memory, and then lock it again when done.

That aspect doesn't necessarily apply to a pool allocator being described in this article; that should not be triggering any false positives.

8dcc

(In case you missed my other post, I ended up adding valgrind support in the article and in the 'libpool' project.)

> I integrated the TXR Lisp garbage collector with valgrind.

That's funny, because when I said "for other projects I am working on", I specifically meant a Lisp interpreter which uses a memory pool and garbage collection (I am still working on implementing this last part, anyway). I am very happy about the fact that I can continue using valgrind in this project.

mananaysiempre

You can also integrate with AddressSanitizer to some extent, look into[1] ASAN_{UN,}POISON_MEMORY_REGION from <sanitizer/asan_interface.h> or the underlying intrinsics __asan_{un,}poison_memory_region.

[1] https://github.com/google/sanitizers/wiki/AddressSanitizerMa...

kragen

Thanks for the tip!

cb321

One aspect people don't tend to mention (and TFA does not because it has "general purpose" as a scope) about doing your own allocation arenas is that because a "pointer" is just the name (really number) of an allocated region, this also affords flexibility in the width of those numbers aka "pointer size" that decides your "address" space.

So, for example, if you are willing to limit yourself to 64 KiNodes of equal sized objects, a pool like https://github.com/c-blake/bst/blob/main/POOL.c can use an unsigned short 2-byte pointer. This small address space specialization can give a full 4x space overhead optimization over modern 64-bit pointers.

You could even use 8x less with 1-byte pointers if 256 nodes is enough or use bit fields or equivalents for even more fine-grained address space size selection. Admittedly, linked structures are usually less interesting for very small address space sizes, but this is sometimes useful. People may complain about arbitrary limits that relate to this kind of optimization, but the linked part could be "small by design" without compromising overall scale { such as for structure within a B-tree node or other multi-level indexing setups }.

In any event, I think it is useful to highlight pointer size arbitrariness to junior engineers. It is often learned once & immediately forgotten behind an onslaught of general purpose-ness (& pointer type systems/syntax). Relocation ideas also relate. E.g., if all the pointers are implicitly indices into a (small?) array, and all the code needs a "base address" of that array to get a VMem address, then you can relocate the whole bundle of objects pretty easily changing only the base address.

kps

‘Handles are the better pointers’¹ is a related writeup that has been discussed here before².

¹ https://floooh.github.io/2018/06/17/handles-vs-pointers.html

² https://news.ycombinator.com/item?id=36419739

IgorPartola

I wish I had caught the original discussions. The moral of the story is that pointers contain too little metadata and not enough places to do bounds checking and type checking to be safer to handle (pun intended). Taking charge of things as “objects” is more robust.

Do notice that it means runtime bounds checking. It also mentions the problem of having a handle reference an object that’s no longer there or that has changed and that the probability of this happening rises with time. This pattern works great in single thread/single process code but starts decaying hard when you have preemptive concurrency. Here you really cannot safely do much of anything without some sort of locking without copy on write semantics (which is also the case with a lot of other systems, just pointing out that this isn’t solved magically by handles though the additional metadata can help detect use-after-free and handle-now-refers-to-a-different-object problem).

So the age old objection that bounds checking slows you down but is necessary to detect problems at runtime is here. You won’t get branch-free code with way compared to straight pointer references but on the other hand you get a lot closer to the concepts of owners and borrowers in straight C which is cool.

cb321

Those are good insights and I've probably written way too many words here already, but I wanted to amplify that your "age old" applies to the whole related "BUNDLE OF PRIMORDIAL QUESTIONS:", i.e. "What is a pointer (or name), anyway? and what do we want from it? and how do we want it? at compile/run time?" It's honestly a sub-part of Martin Fowler's famous Two Hard Things in CS joke, although I think that is more commonly (and shallowly) interpreted to be about simple identifier bike-shedding these days. There is probably some good blog post "What's in a Reference?" already written somewhere.

Different answers have different trade-offs in different settings and are variously (un)popular (e.g., x86 memory segments, LBA disk addressing, static/dynamic name scoping in PLs, "virtual" memory itself, ...). Besides the semantic/run-time/safety aspects of "referencing" you mention there are the syntactic ones I mentioned over in https://news.ycombinator.com/item?id=42644009 . Because it is so hard to get one-size-fits-all answers, I think a lot of user-defined layering makes sense, but because of all the hazards you mention, systems support for safety & perf is often really ambitious which makes the user-definition harder (for some/all programmers in the collaboration).

As another attempted witticism in this direction: "Some Assembly Required" is the general context for almost all engineering, but "how much 'some' is offputting" and 'Wait - assembly by WHO?" are big open questions that crystallize around various axes in various human communities. Lately, my one-liner is "Delegation affords so much, but trust sure is tricky!".

8dcc

Yes, this is true. I didn't use that in the code because I thought it would make it less readable, though. I might mention it somewhere, thank you.

cb321

In many prog.lang's, it absolutely makes things less readable. This is not an accident, but because the PL itself specifically tries to syntactically support pointer types (i.e. to literally aid readability). Most PLs do so only for general purpose VMem pointers with user-defined pointers being afterthoughts (at best). You probably need (at least!) operator overloading to get the same level of readability as "PL-native" pointers { though "readability" is a bit subjective and powerful PL features can be divisive; Scare "quotes" for interpretation hazard :-) }.

Anyway, you probably knew all that & I don't mean to suggest otherwise, but it made sense to write since HN loves PL talk.

8dcc

Hello, I am the author. Thank you all for the instructive comments, I made some changes to the article since I first posted it:

- Added a link to this HN post.

- Renamed some variables and types in the code for readability.

- Mention (in a footnote) that we could allocate the 'Pool' structure and the chunk arrays with a single call, or even return the 'Pool' on the stack. Suggested by 'liontwist' and 'unwind'.

- Mention (in a footnote) that we don't always need to store the full address when building the linked list of free chunks. Suggested by 'cb321'.

- Added valgrind support (also to my 'libpool' project). Suggested by 'kazinator'.

cb321

FWIW, while it is true that a smaller pointer lets you have smaller minimum fixed size objects in your pool, as your new footnote suggests, the space concern I was mostly alluding to in [1] is all the references in your client code { such as the BST code I linked to where each binary search tree (BST) node has 2 pointers (or 3 if they have parent pointers to give BST nodes "doubly linked list-like autonomy") }.

This can really add up. For example, like 20 years ago, I did a `print sizeof` in gdb for some g++ STL map node with a data payload of like 4 bytes and it was some obscene size approximately >= 1 order of magnitude bigger than necessary (like 80 bytes instead of, say 8 with 2Byte ptrs). While main memory is cheap, CPU cache memory is not and for "linky" structures, it often pays to be more cache-friendly. It's pretty easy to imagine a 10X smaller thing fitting into an L2 CPU cache where the larger thing would not even fit in L3, netting you a big latency boost (though the more root-ward levels of the tree likely always stay in caches in a hot loop).

EDIT: Anyway, you don't have to believe me. Others did an x32 ABI for x86_64 mostly to have "merely 2X" smaller pointers: https://en.wikipedia.org/wiki/X32_ABI

[1] https://news.ycombinator.com/item?id=42643410

tandr

As a last code snippet readability suggestion - maybe extract "malloc" and "free" calls to your own simple wrappers (like my_malloc and my_free), and do "valgrinding" inside of them? You can also pass additional params to these functions to specify type of allocation if needed, or make separate functions.

8dcc

I am not sure what you mean, exactly. There are some other valgrind macros that need to be called for pointers other than the ones returned by 'malloc'. More importantly, the memory regions need to be marked as 'NOACCESS' after we are done using them from the pool allocator, or valgrind will complain; that's why I set them back to 'DEFINED' before writing to them in other functions.

This is my first time using valgrind, however, so I might just be confused about what you mean, and there is a simpler way of doing all this.

veltas

Also the font isn't incredibly easy to read on Windows. I'm assuming Georgia was selected? Which has been on Windows for years but renders so badly it looks like the font color isn't black, even though I think it is black? This isn't really your fault but I would have stuck to Times New Roman for Windows' sake.

8fingerlouie

A long time ago, in a galaxy far far away, i wrote something similar for an embedded platform. I was implementing a WAP Browser at the time (yes, WAP, https://en.wikipedia.org/wiki/Wireless_Application_Protocol), and we needed dynamic memory allocation in an environment where everything was static due to realtime constraints.

So i ended up with this : https://github.com/jinie/memmgr

keyle

The parent article is quite interesting and slightly different being cpp [1]. There is also a post about writing a memory allocator [2].

[1]: http://dmitrysoshnikov.com/compilers/writing-a-pool-allocato...

[2]: http://dmitrysoshnikov.com/compilers/writing-a-memory-alloca...

exDM69

Here's another interesting O(1) memory allocator but with arbitrary sized allocations and low fragmentation. Negative side is relatively high memory overhead (a few dozen bytes per allocation).

This kind of allocator is often used to suballocate GPU memory in game and graphics applications.

I'm using a variant of this algorithm with added support for shrinkable and aligned allocations and flexible bin sizing.

You can also extend this idea to two dimensions to create texture atlases, which is possible in O(1) for power of two sized allocations.

Original: https://github.com/sebbbi/OffsetAllocator Rust port: https://crates.io/crates/offset-allocator

alecco

(related) LLFree

Paper: LLFree: Scalable and Optionally-Persistent Page-Frame Allocation https://www.usenix.org/system/files/atc23-wrenger.pdf

Video presentation: https://www.youtube.com/watch?v=yvd3D5VOHc8

Implementation and benchmarks are well documented at the repos:

Rust repo https://github.com/luhsra/llfree-rs C repo https://github.com/luhsra/llfree-c

teo_zero

What I find weird is that the chunk's size is fixed, while the caller must specify how many chunks a pool contains at pool creation. I would do exactly the dual: the chunk's size should be defined at pool creation, so that you can create multiple pools, each dedicated to one specific kind of object. (You need a third field in struct Pool, though.) Instead, the number of chunks per pool should not something the user needs to care about: if they knew in advance how many objects are required, they would simply allocate an array big enough! The library implementation should define a sensible number and stick to it.

Similarly, I don't like exposing pool_expand(): too much burden on the user. Expansion should be automatically triggered by pool_alloc() whenever the current pool is exhausted.

This would also allow a much simpler pool_new() that just initializes its pointers to NULL, leaving it to the first invocation of pool_alloc() to actually do the first allocation. This would avoid the duplication of code between pool_new() and pool_expand().

Another benefit of fixing the number of chunks is a possible simplification of ArrayStart: this could include a proper array instead of a pointer, avoiding a malloc(). Something like this:

  struct ArrayStart {
    struct ArrayStart *next;
    Chunk arr[CHUNKS_PER_ARRAY];
  };

8dcc

> I would do exactly the dual: the chunk's size should be defined at pool creation, so that you can create multiple pools, each dedicated to one specific kind of object.

This is supported in my 'libpool' project. I thought I mentioned it in the article, but now I am not so sure.

https://github.com/8dcc/libpool/blob/main/src/libpool.h#L51

> Similarly, I don't like exposing pool_expand(): too much burden on the user. Expansion should be automatically triggered by pool_alloc() whenever the current pool is exhausted.

I feel like this shouldn't be too hard to do, but I actually did write a function for this in another project I am working on.

https://github.com/8dcc/sl/blob/9ddd84d75ffc3b0ba1373bc13bc6...

> This would also allow a much simpler pool_new() that just initializes its pointers to NULL, leaving it to the first invocation of pool_alloc() to actually do the first allocation.

I didn't think of this, but I rather keep the two functions separate, specially for readability in the article.

voctor

There is a small series of posts on the BitSquid blog about memory allocation which is worth reading! http://bitsquid.blogspot.com/2015/08/allocation-adventures-3...

sylware

It all depends on the load profile using the allocator. You never know, that said you cannot beat, in theory, a semantically specialized allocator for a specific load... in other words, the right way(TM). This means applications should bundle their own allocators for the various load types they have. The "generic" allocator is sort of an heresy for the lazy, short termists or those who are in hurry. Don't worry I still use such generic allocator, sometimes, but often I do mmap myself the memory and deal directly with it.

astrange

The system allocator is the one that comes with all the system's memory debugging tools and security (or lack of it), so you'd better not mind losing those if you want to ditch it.

drysine

Also "Arena allocator tips and tricks (nullprogram.com)" [0]

[0] https://news.ycombinator.com/item?id=37670740