Fil's Unbelievable Garbage Collector

31 comments

·September 5, 2025

crawshaw

It is great that Fil-C exists. This is the sort of technique that is very effective for real programs, but that developers are convinced does not work. Existence proofs cut through long circular arguments.

johncolanduoni

What do the benchmarks look like? My main concern with this approach would be that the performance envelope would eliminate it for the use-cases where C/C++ are still popular. If throughput/latency/footprint are too similar to using Go or what have you, there end up being far fewer situations in which you would reach for it.

pizlonator

Some programs run as fast as normally. That's admittedly not super common, but it happens.

Some programs have a ~4x slowdown. That's also not super common, but it happens.

Most programs are somewhere in the middle.

> for the use-cases where C/C++ are still popular

This is a myth. 99% of the C/C++ code you are using right now is not perf sensitive. It's written in C or C++ because:

- That's what it was originally written in and nobody bothered to write a better version in any other language.

- The code depends on a C/C++ library and there doesn't exist a high quality binding for that library in any other language, which forces the dev to write code in C/C++.

- C/C++ provides the best level of abstraction (memory and syscalls) for the use case.

Great examples are things like shells and text editors, where the syscalls you want to use are exposed at the highest level of fidelity in libc and if you wrote your code in any other language you'd be constrained by that language's library's limited (and perpetually outdated) view of those syscalls.

monkeyelite

You are making a lot of assumptions about my code.

Quitschquat

I love garbage collector design and impl. It’s one of those “go to” thing to do when learning a new language.

Never heard of this one, looking forward to diving in this weekend.

charleslmunger

This is really cool! I noticed

>The fast path of a pollcheck is just a load-and-branch.

A neat technique I've seen used to avoid these branches is documented at https://android-developers.googleblog.com/2023/11/the-secret... under "Implicit suspend checks".

pizlonator

Yeah that’s a common optimization for poll checks. I think most production JVMs do that.

I’m very far from doing those kinds of low level optimizations because I have a large pile of very high level and very basic optimizations still left to do!

gleenn

> The only "pause" threads experience is the callback executed in response to the soft handshake, which does work bounded by that thread's stack height.

So this is probably not great for functional/deeply-recursive code I guess?

pizlonator

Meh.

The stack scan is really fast. There's not a lot of logic in there. If you max out the stack height limit (megabytes of stack?) then maybe that means milliseconds of work to scan that stack. That's still not bad.

adastra22

That's a very long time. Milliseconds of work is an entire frame update-render cycle in a modern game.

munificent

Games don't tend to have very deep callstacks. And if a game cared about performance also wanted to use GC, it would probably try to run the GC at the end of a frame when there is little on the stack.

pizlonator

Would your modern game have a stack that is so deep that it brushes up against the stack height limit?

Probably not. Your game would be inches of stack away from crashing

jandrewrogers

I think it is really cool that someone is going hard at this part of the design space from an engineering geek standpoint even though I can’t use it myself.

pcfwik

Given the goal is to work with existing C programs (which already have free(...) calls "carefully" placed), and you're already keeping separate bounds info for every pointer, I wonder why you chose to go with a full GC rather than lock-and-key style temporal checking[1]? The latter would make memory usage more predictable and avoid the performance overhead and scheduling headaches of a GC.

Perhaps storing the key would take too much space, or checking it would take too much time, or storing it would cause race condition issues in a multithreaded setting?

[1] https://acg.cis.upenn.edu/papers/ismm10_cets.pdf

pizlonator

I think the lock and key approaches don’t have Fil-C’s niftiest property: the capability model is totally thread safe and doesn’t require fancy atomics or locking in common cases

pcfwik

makes sense, thanks --- cool project!

pcfwik

Also find it interesting that you're allowing out-of-bounds pointer arithmetic as long as no dereference happens, which is a class of UB compilers have been known to exploit ( https://stackoverflow.com/questions/23683029/is-gccs-option-... ). Do you disable such optimizations inside LLVM, or does Fil-C avoid this entirely by breaking pointers into pointer base + integer offset (in which case I wonder if you're missing out on any optimizations that work specifically on pointers)?

pizlonator

For starters, llvm is a lot less willing to exploit that UB

It’s also weird that GCC gets away with this at all as many C programs in Linux that compile with GCC make deliberate use of out of bounds pointers.

But yeah, if you look at my patch to llvm, you’ll find that:

- I run a highly curated opt pipeline before instrumentation happens.

- FilPizlonator drops flags in LLVM IR that would have permitted downstream passes to perform UB driven optimizations.

- I made some surgical changes to clang CodeGen and some llvm passes to fix some obvious issues from UB

But also let’s consider what would happen if I hadn’t done any of that except for dropping UB flags in FilPizlonator. In that case, a pass before pizlonation would have done some optimization. At worst, that optimization would be a logic error or it would induce a Fil-C panic. FilPizlonator strongly limits UB to its “memory safe subset” by construction.

I call this the GIMSO property (garbage in, memory safety out).

kartoffelsaft

Not knowing the exact language used by the C standard, I suspect the reason GCC doesn't cause these issues with most programs is that the wording of "array object" refers specifically to arrays with compile-time-known sizes, i.e. `int arr[4]`. Most programs that do out of bounds pointer arithmetic are doing so with pointers from malloc/mmap/similar, which might have similar semantics to arrays but are not arrays.

AlotOfReading

    FilPizlonator drops flags in LLVM IR that would have permitted downstream passes to perform UB driven optimizations.

Does this work reliably or did your patches have to fix bugs here? There are LLVM bugs with floating point where backend doesn't properly respect passed attributes during codegen, which violate the behaviors of user level flags. I imagine the same thing exists for UB.

o11c

Note that the "safepointing" logic is exactly the same thing that's needed in refcounting to atomic replace a field.

This article glosses over what I consider the hardest part - the enter/exit functionality around native functions may that block (but which must touch the allocator).

pizlonator

> Note that the "safepointing" logic is exactly the same thing that's needed in refcounting to atomic replace a field.

No it's not, not even close.

> This article glosses over what I consider the hardest part - the enter/exit functionality around native functions may that block (but which must touch the allocator).

Yeah, that part is hard, and maybe I'll describe it in another post.

Look for `filc_enter` and `filc_exit` in https://github.com/pizlonator/fil-c/blob/deluge/libpas/src/l...

system2

> Fil-C uses a parallel concurrent on-the-fly grey-stack Dijkstra accurate non-moving garbage collector called FUGC

Half of my hair turned white while trying to understand this.

kerkeslager

I'm not sure I understand all of what they're doing there, but I did read the referenced Doligez-Leroy-Gonthier paper a while back and I am glad someone is doing something with that in a non-academic (maybe?) context. That paper looked promising to me when I read it, but I basically had no faith that it would ever make it out of academia because the algorithm is so complex. It took me a really long time to think I understood it, and it's one of those things I actually am not confident I could implement even when I understood it (I certainly don't understand it now).

pizlonator

I don’t think I’m the only one doing something in the vicinity of that paper. I think close relatives of DLG shipped in some prod JVMs.

kerkeslager

Interesting. I've read a lot about some complex JVMs, but I guess maybe they didn't cite their sources and I didn't make the connection on my own.

reactordev

Oh this is so cool

jomarry

[dead]

HN

Fil's Unbelievable Garbage Collector

Fil's Unbelievable Garbage Collector