Imposing memory security in C [video]
100 comments
·February 27, 2025pizlonator
johnnyjeans
> There is no way to break out of the clearly stated guarantees.
I disagree on this, and having escape hatches is critically important. Are we really going to call Rust or Haskell memory unsafe because they offer ways to break their safety guarantees?
pizlonator
I think that Rust's clearly stated guarantee holds if you never say "unsafe".
That's still a clear statement, because it's trivial to tell if you used "unsafe" or not.
johnnyjeans
Maybe I just misinterpret what you mean. When you say "no way" and "all cases", I take your meaning literally. The existence of pointers to bypass the borrow checker, disabling runtime bounds checks and unsafe blocks are exactly that: escape hatches to break Rust's safety, in the same way type-casting is an escape hatch to break C's (anemic) type safety, and unsafePerformIO in Haskell is an escape hatch to break every bone in your body.
the__alchemist
Trivial is not a word I would use here! Rust's `unsafe` gets fuzzy as you transverse an operation's dependencies! There are many applications where marking a function as `unsafe` is subjective.
jeffrallen
But the Rust ecosystem is littered with unsafe, so good luck getting the actual benefits of Rust. :(
kstrauser
Thinking aloud, and this is probably a bad idea for reasons I haven’t thought of.
What if pointers were a combination of values, like a 32 bit “zone” plus a 32 bit “offset” (where 32/32 is probably really 28/36 or something that allows >4GB allocations, but let’s figure that out later). Then each malloc() could increment the zone number, or pick an unused one randomly, so that there’s enormous space between consecutive allocs and an address wouldn’t be reissued quickly. A dangling pointer would the point at an address that isn’t mapped at all until possibly 2^32 malloc()s later. It wouldn’t help with long-lived dangling pointers, but would catch accessing a pointer right after it was freed.
I guess, more generally, why are addresses reused before they absolutely must be?
zyedidia
It sounds like what you're describing is one-time allocation, and I think it's a good idea. There is some work on making practical allocators that work this way [1]. For long-running programs, the allocator will run out of virtual address space and then you need something to resolve that -- either you do some form of garbage collection or you compromise on safety and just start reusing memory. This also doesn't address spatial safety.
[1]: https://www.usenix.org/system/files/sec21summer_wickman.pdf
naasking
> For long-running programs, the allocator will run out of virtual address space and then you need something to resolve that -- either you do some form of garbage collection or you compromise on safety and just start reusing memory
Or you destroy the current process after you marshall the data that should survive into a newly forked process. Side benefit: this means you get live upgrade support for free, because what is a live upgrade but migrating state to a new process with updated code?
kstrauser
Oh, nifty! I guarantee you anyone else discussing this has put more than my 5 minutes' worth of thought into it.
Yeah, if you allow reuse then it wouldn't be a guarantee. I think it'd be closer to the effects of ASLR, where it's still possible to accidentally still break things, just vastly less likely.
pizlonator
That’s a way of achieving safety that has so many costs:
- physical fragmentation (you won’t be able to put two live objects into the same page)
- virtual fragmentation (there’s kernel memory cost to having huge reservations)
- 32 bit size limit
Fil-C achieves safety without any of those compromises.
kstrauser
For sure. I'm under no illusion that it wouldn't be costly. What I'm trying to suss out is whether libc could hypothetically change to give better safety to existing compiled binaries.
layer8
This sounds similar to the 386 segmented memory model: https://en.wikipedia.org/wiki/X86_memory_segmentation#80386_...
However, it was limited to 8192 simultaneous “allocations” (segments) per process (or per whatever unit the OS associates the local descriptor tables with).
throwawaymaths
you can do this easily with virtual memory, and IIRC Zig's general purpose allocator does under some circumstances (don't remember if its default or if it needs a flag).
blacksqr
There are and have been many techniques and projects for making C more memory-safe. The crucial question it always comes down to is what performance hit do you take using them?
That's why C has been on top for so long. Seat-of-the-pants hand-crafted C has always been the fastest high-level language.
WalterBright
C's memory safety could be drastically improved with the addition of bounds-checked arrays (which is an extension, and does not change existing code):
https://www.digitalmars.com/articles/C-biggest-mistake.html
25 years of experience with D has shown this to be a huge improvement.
D also has references as an alternative to pointers. References cannot have arithmetic done on them. Hence, by replacing pointers with references, and with array bounds checking, the incidence of memory corruption is hugely reduced.
pizlonator
> C's memory safety could be drastically improved with the addition of bounds-checked arrays (which is an extension, and does not change existing code):
If you solved that problem then you'd still have a dumpster fire of memory safety issues from bad casts, use after free, etc
WalterBright
I found that C programs rarely evolve beyond their initial design. The trouble is, it's hard to refactor C programs. For example,
struct S { int a; };
struct S s; s.a = 3;
struct S *p; p->a = 3;
I.e. a . is for direct access, -> for indirect access. Let's say you want to change passing S by value to passing S by pointer. Now you have to update every use, instead of just the declaration.This is how it would work in D:
struct S { int a; }
S s; s.a = 3;
S* p; p.a = 3;
ref S q; q.a = 3;
And so refactoring becomes much easier, and so happens more often.WalterBright
> C has always been the fastest high-level language.
C has another big speed problem. Strings are 0 terminated, rather than length terminated. This means constant scanning of strings to find their length. Even worse, the scanning of the string reloads the cache with the string contents, which is pretty bad for performance.
Of course, you could use `struct String { char *p; size_t length; };` but since every library you want to connect to uses 0 terminated strings, you're out on your island all alone, so pragmatically it does not work.
Another speed-destroying problem with C strings is you cannot take a substring that does not require allocating a new string and then copying the data. (Unless the substring is right-justified.) This is not fast in any universe.
D uses length-denoted strings as a basic data type, and with string processing code, it is much faster than C. Substrings are quick and easy. You can still interface with C because D string literals implicitly convert to C string literals, as the literals are 0 terminated. So this works in D:
printf("hello world!\n");
(People sometimes rag on me for still using printf, but printf is the most optimized and debugged library function in the world, so I take advantage!)pizlonator
> There are and have been many techniques and projects for making C more memory-safe.
Sort of. None of them got all the way to safety, or they never got all the way to compatibility with C.
Fil-C is novel in that it achieves both safety and compatibility.
> The crucial question it always comes down to is what performance hit do you take using them?
Is that really the crucial question?
I don't think you would have even gotten to asking that question with most attempts to make C memory safe, because they involved experimental academic compilers that could only compile a subset of the language and only worked for a tiny corpus of benchmarks.
Lots of C/C++ code is not written with a perf mindset. Most of the UNIX utilities are like that, for example.
> That's why C has been on top for so long. Seat-of-the-pants hand-crafted C has always been the fastest high-level language.
I don't think that's the reason. C rose to where it is today even when it was much slower than assembly. C was slower than FORTRAN for a long time (maybe still is?) but people preferred C over FORTRAN anyway.
C's biggest superpower is how easy it makes it to talk to system ABI (syscalls, dynamic linking, etc).
blacksqr
>> There are and have been many techniques and projects for making C more memory-safe.
> Sort of.
Yes. That's why I used the qualifier "more." Our statements are not in conflict.
> Fil-C is novel in that it achieves both safety and compatibility.
How does it affect performance?
>> The crucial question it always comes down to is what performance hit do you take using them?
> Is that really the crucial question?
Yes, because it's the factor that industry leaders use to decide on which language to use. For example, Apple switching from Pascal to C way back in the Stone Age. The fact that it's the crucial question doesn't mean that lots of people don't consider other factors for their own reasons.
> I don't think you would have even gotten to asking that question with most attempts to make C memory safe.
Yes, most. But for example, Microsoft's Checked C comes with a performance penalty of almost 10% for a partial solution. Not academic. Very commercial.
> C rose to where it is today even when it was much slower than assembly
Yes, that's why I said "high-level language." I don't consider assembly high-level, do you?
> people preferred C over FORTRAN anyway
People preferred C in the 1970s/80s because at the time you could allocate memory dynamically in C but not in FORTRAN. FORTRAN fixed that in the 1990s, but by then there were too few FORTRAN programmers to compete. Since then C has serially defeated all newcomers. Maybe Go or Rust are poised to take it on. When a major operating system switches from C, we'll know.
PaulDavisThe1st
> There are clearly stated guarantees, like "a pointer cannot access outside the bounds of its allocation"
But that's not a pointer in anything like the sense of a C pointer.
You'd need to reword that (as I know you've been doing with FiL-C) to be something more like: no reference to a (variable|allocation|object) may ever be used to access memory that is not a part of the object.
Pointers are not that, and the work you've done in FiL-C to make them closer to that makes them also be "not pointers" in a classic sense.
I'm OK with that, it just needs to be more clear.
pizlonator
Semantics.
You can call Fil-C’s pointers whatever you like. You can call them capabilities if that works better for you.
The point of my post is to enumerate the set of things you’d need to do to pointers to make them safe. If that then means we’ve created something that you wouldn’t call a pointer then like whatever
tredre3
> Semantics.
If your goal is just to redefine the word then by all means, continue.
But semantics are very important if your goal is to drive adoption of your ideas. You can't misuse a term and then get pissy when people don't understand you.
PaulDavisThe1st
And my point is that you cannot make C pointers safe. You can make something else that is safe, and you're clearly hard at work on that, which is great.
beardyw
I remember chasing down a memory leak in my first commercial C code. Took me a long while to discover that if you allocate zero bytes you still have to free it! After that I took nothing for granted.
weinzierl
It's not even guaranteed that it doesn't allocate, so a malloc(0) could cause an out of memory.
ignoramous
> malloc(0) could cause an out of memory
tbh, 640K RAM ought to be enough for anybody.
weinzierl
For the last drop to make the cup run over it doesn't matter how big the cup is.
cryptonector
Nah. I use C a lot, but none of this is enough to make C safe. You really need the language and the tools to enforce discipline. Oh, and things like the cleanup attribute are not standard C either, so this is not portable code.
imtringued
I didn't see anything new either.
What I would expect from C developers is this:
* Run all CI with UBSAN. Create versions of popular distributions that build every single package with UBSAN just to catch the bugs.
* Use design by contract patterns for pre and postconditions (library induced UB is popular in C++)
* Use model checking software like CBMC to statically guarantee the absence of UB and validity of the contracts
* Build a fuzzer for every method that cannot be formally verified
This is the bare minimum needed to keep C/C++ safe. The same applies to unsafe Rust by the way.
cryptonector
I'd expect something like a super-C that provides a counted-byte string (still NUL-terminated for interoperability) type and support functions, `defer` or similar, `with`-like macros, etc., `mutable`/`immutable`. Such a thing could be like C++ was in the beginning: a front-end that translates to C99 or whatever standard is your lowest common denominator. You'd still have to do manual memory management, so you'd still have use-after-free issues, but they'd be a lot less common. Similarly you'd still have races, but a lot fewer.
throwawaymaths
usually portability in C includes the provision that you can drop in whatever #includes you want?
cryptonector
No, it's really not that simple at all.
throwawaymaths
Probably depends on the macro, but ok.
debatem1
I don't think anyone ever doubted that a C program could be memory safe. The problem is knowing without exhaustive work whether yours is one of them.
These aren't bad practices, but I don't think they satisfy that desire either.
SV_BubbleTime
I am in no way at all better than that guy. Not even sort of. I appreciate his talk.
However, if I were to make a presentation based on my superior C practices, it would have to be implementation and example heavy.
All of his rules sound great, except for when you have to break them or you don’t know how to do the things he’s talking about in your code, because you need to get something done today.
It reads a little like “I’ve learned a lot of lessons over my career, you should learn my lessons. You’re welcome.”
rrrix1
The talk was obviously extremely time-limited, as demonstrated when they basically skipped the last handful of slides and then it abruptly ended. I think for the time allocated, it was just right, and they did include a couple of examples where it made sense.
To me, "memory safety" really means:
- There are clearly stated guarantees, like "a pointer cannot access outside the bounds of its allocation" and "if you free an object while there are still pointers to it then those pointers cannot be dereferenced". These guarantees should be something you can reason about formally, and they should be falsifiable. Not sure this presentation really has that. It's not clear what they prevent, and what they don't prevent.
- There is no way to break out of the clearly stated guarantees. Totally unclear that whatever guarantees they have are actually guarded against in all cases. For example, what if a tmp_alloc'd object pointer escapes into another tmp_alloc'd object with different lifetime - I get that they wouldn't write code that does that intentionally, but "memory safety" to me means that if you did write that code, you'd either get a compile error or a runtime error.
It's possible to ascribe clearly stated guarantees to C and to make it impossible to break out of them (Fil-C and CHERI both achieve that).