Writing a simple pool allocator in C
91 comments
·January 5, 2025jll29
cb321
Kernighan & Ritchie's "The C Programming Language" also has an example implementation of malloc() & free() { for any-size requests as @kragen notes in sibling } in it in like 3 pages (though, sure, it is far from the most efficient implementation). Since it's the last section of the main text, I'd guess that most readers don't get that far (even though it's a short book).
kragen
I think you mean "literate programming".
The arena allocator in Hanson's chapter 6 is an arena allocator, not a "pool allocator" in the sense that 8dcc means, which is to say, a constant-time fixed-size-block allocator. It seems that you had the same reading comprehension problem that I did.
The difference is that Hanson's arena allocator can make allocations of any size (his Arena_alloc takes an nbytes argument) and cannot deallocate individual allocations, only the entire arena (his Arena_free takes only an arena pointer as a parameter). A fixed-size-block allocator like the one we are discussing here can only make allocations of a fixed size, but can recycle them individually, not only all at once.
(If Hanson's name sounds familiar, it might be because you've read the lcc book.)
It may be worth noting that Donald Knuth also invented the font used in 8dcc's article.
actionfromafar
Honestly, "literary programming" does better explain how Knuth writes code.
kragen
Yes, but that name lacks the benefit of implicitly describing other approaches to programming as "illiterate".
johnisgood
My minor nitpick regarding the code is the choice of variable names, but other than that, it is indeed easy to read and it is how I implement data structures and whatnot.
I like that there is a more or less exhaustive blog post about it which happens to look nice as well.
8dcc
Thank you so much! I will definitely check it out. I also planned on writing an article about arena allocators soon.
kazinator
If you write your own allocator in C, do yourself a favor and use the valgrind API inside it. Its use can be conditional so you can "compile it out". Or should I say it has to be, so you can build the code in an environment where there's no valgrind API.
gopalv
+1 - this isn't even that hard, it is include a single header + annotate allocations/frees, with some redzones around the actual allocation to know when the user-code is writing into the pool pointer areas.
https://developers.redhat.com/articles/2022/03/23/use-valgri...
null
tavianator
I've replaced Valgrind with sanitizers for most of my workflow. Using the sanitizer APIs in your custom allocator is also easy: https://github.com/tavianator/bfs/blob/main/src/sanity.h https://github.com/tavianator/bfs/blob/31dffd6441ba80ea998ce...
8dcc
I have never heard about this (using valgrind with a "non-standard allocator"), but it looks really interesting, specially for other projects I am working on. I will definitely look into it.
mananaysiempre
You can also integrate with AddressSanitizer to some extent, look into[1] ASAN_{UN,}POISON_MEMORY_REGION from <sanitizer/asan_interface.h> or the underlying intrinsics __asan_{un,}poison_memory_region.
[1] https://github.com/google/sanitizers/wiki/AddressSanitizerMa...
kragen
Thanks for the tip!
cb321
One aspect people don't tend to mention (and TFA does not because it has "general purpose" as a scope) about doing your own allocation arenas is that because a "pointer" is just the name (really number) of an allocated region, this also affords flexibility in the width of those numbers aka "pointer size" that decides your "address" space.
So, for example, if you are willing to limit yourself to 64 KiNodes of equal sized objects, a pool like https://github.com/c-blake/bst/blob/main/POOL.c can use an unsigned short 2-byte pointer. This small address space specialization can give a full 4x space overhead optimization over modern 64-bit pointers.
You could even use 8x less with 1-byte pointers if 256 nodes is enough or use bit fields or equivalents for even more fine-grained address space size selection. Admittedly, linked structures are usually less interesting for very small address space sizes, but this is sometimes useful. People may complain about arbitrary limits that relate to this kind of optimization, but the linked part could be "small by design" without compromising overall scale { such as for structure within a B-tree node or other multi-level indexing setups }.
In any event, I think it is useful to highlight pointer size arbitrariness to junior engineers. It is often learned once & immediately forgotten behind an onslaught of general purpose-ness (& pointer type systems/syntax). Relocation ideas also relate. E.g., if all the pointers are implicitly indices into a (small?) array, and all the code needs a "base address" of that array to get a VMem address, then you can relocate the whole bundle of objects pretty easily changing only the base address.
8dcc
Yes, this is true. I didn't use that in the code because I thought it would make it less readable, though. I might mention it somewhere, thank you.
cb321
In many prog.lang's, it absolutely makes things less readable. This is not an accident, but because the PL itself specifically tries to syntactically support pointer types (i.e. to literally aid readability). Most PLs do so only for general purpose VMem pointers with user-defined pointers being afterthoughts (at best). You probably need (at least!) operator overloading to get the same level of readability as "PL-native" pointers { though "readability" is a bit subjective and powerful PL features can be divisive; Scare "quotes" for interpretation hazard :-) }.
Anyway, you probably knew all that & I don't mean to suggest otherwise, but it made sense to write since HN loves PL talk.
8fingerlouie
A long time ago, in a galaxy far far away, i wrote something similar for an embedded platform. I was implementing a WAP Browser at the time (yes, WAP, https://en.wikipedia.org/wiki/Wireless_Application_Protocol), and we needed dynamic memory allocation in an environment where everything was static due to realtime constraints.
So i ended up with this : https://github.com/jinie/memmgr
8dcc
Hello, I am the author. Thank you all for the instructive comments, I made some changes to the article since I first posted it:
- Added a link to this HN post.
- Renamed some variables and types in the code for readability.
- Mention (in a footnote) that we could allocate the 'Pool' structure and the chunk arrays with a single call, or even return the 'Pool' on the stack. Suggested by 'liontwist' and 'unwind'.
- Mention (in a footnote) that we don't always need to store the full address when building the linked list of free chunks. Suggested by 'cb321'.
- Added valgrind support (also to my 'libpool' project). Suggested by 'kazinator'.
cb321
FWIW, while it is true that a smaller pointer lets you have smaller minimum fixed size objects in your pool, as your new footnote suggests, the space concern I was mostly alluding to in [1] is all the references in your client code { such as the BST code I linked to where each binary search tree (BST) node has 2 pointers (or 3 if they have parent pointers to give BST nodes "doubly linked list-like autonomy") }.
This can really add up. For example, like 20 years ago, I did a `print sizeof` in gdb for some g++ STL map node with a data payload of like 4 bytes and it was some obscene size approximately >= 1 order of magnitude bigger than necessary (like 80 bytes instead of, say 8 with 2Byte ptrs). While main memory is cheap, CPU cache memory is not and for "linky" structures, it often pays to be more cache-friendly. It's pretty easy to imagine a 10X smaller thing fitting into an L2 CPU cache where the larger thing would not even fit in L3, netting you a big latency boost (though the more root-ward levels of the tree likely always stay in caches in a hot loop).
EDIT: Anyway, you don't have to believe me. Others did an x32 ABI for x86_64 mostly to have "merely 2X" smaller pointers: https://en.wikipedia.org/wiki/X32_ABI
veltas
Also the font isn't incredibly easy to read on Windows. I'm assuming Georgia was selected? Which has been on Windows for years but renders so badly it looks like the font color isn't black, even though I think it is black? This isn't really your fault but I would have stuck to Times New Roman for Windows' sake.
keyle
The parent article is quite interesting and slightly different being cpp [1]. There is also a post about writing a memory allocator [2].
[1]: http://dmitrysoshnikov.com/compilers/writing-a-pool-allocato...
[2]: http://dmitrysoshnikov.com/compilers/writing-a-memory-alloca...
exDM69
Here's another interesting O(1) memory allocator but with arbitrary sized allocations and low fragmentation. Negative side is relatively high memory overhead (a few dozen bytes per allocation).
This kind of allocator is often used to suballocate GPU memory in game and graphics applications.
I'm using a variant of this algorithm with added support for shrinkable and aligned allocations and flexible bin sizing.
You can also extend this idea to two dimensions to create texture atlases, which is possible in O(1) for power of two sized allocations.
Original: https://github.com/sebbbi/OffsetAllocator Rust port: https://crates.io/crates/offset-allocator
voctor
There is a small series of posts on the BitSquid blog about memory allocation which is worth reading! http://bitsquid.blogspot.com/2015/08/allocation-adventures-3...
drysine
Also "Arena allocator tips and tricks (nullprogram.com)" [0]
2rsf
This bring old memories back, I implemented something similar on a motorola 68360 ages ago. Since the size of the buffer I used was not huge I skipped the pointers and simply enumerated each chunk, it was remarkably efficient and stable.
codr7
I usually default to slab allocators if I have to write my own:
9029
Another (admittedly less portable) way to solve the resizing problem is to reserve a large virtual memory space using linux mmap with the PROT_NONE flag and then commit pages as needed using mprotect with PROT_READ|PROT_WRITE flags. I believe windows' VirtualAlloc/VirtualProtect also allows this
This code is very beautifully written, thanks for sharing.
However, you should consult the book "C: Interfaces and Implementations" by Dave Hanson, which has a library called "Arena" that is very similar to your "Pool"s, and it shows a few more tricks to make the code safer and easier to read (Chapters 5+6, 6 in particular).
D. Hanson's book (written in the 'literary programming' style invented by Donal Knuth) also demonstrates having debugging and production implementations for the same C API to catch memory errors, and his book is from 1997:
(He used to be a SE professor I Princeton before getting hired by Microsoft, I believe.)