Memory safety is table stakes
75 comments
·June 26, 20250xbadcafebee
pron
Algol 68 and Pascal weren't memory-safe, and as for Lisp, Smalltalk, and ML, their style of memory safety - based on GC - took over the world pretty much the second it became practical enough for widespread use.
It is true that some decisions people make aren't rational, and it may even be true that most decisions most people make aren't entirely rational, but the claim that the whole software market, which is under selective pressures, manages to make irrationally wrong decisions in a consistently biased way is quite extraordinary and highly unlikely. What is more likely is that the decisions are largely rational, just don't correspond to your preferences. It's like the VHS vs. Betamax story. Fans of the latter thought that the preference for the former was irrational because of the inferior picture quality, but VHS was superior in another respect - recording time - that mattered more to more people.
I was programming military applications in Ada in the nineties (also not memory-safe, BTW) and I can tell you we had very good reasons to switch to C++ at the time, even from a software correctness perspective (I'm not saying C++ still retains those particular advantages today).
If you think so many people who compete with each other make a decision you think is obviously irrational, it's likely that you're missing some information.
daymanstep
What were the reasons for switching from ADA to C++ if I may ask?
pron
The compiler was much faster, the tooling better, and it was easier to find knowledgeable programmers (we were spending quite a bit of time sifting through thick Ada reference manuals). Whatever correctness benefits Ada provided at the language level were more than made up for by C++'s productivity boosts (at the time) that allowed writing and running more tests and fixing bugs more quickly, resulting in code that was no less correct and easier to maintain and evolve to boot.
burnt-resistor
Pascal has subrange integer types. I'm wondering if any other language besides family relatives Ada or Delphi has this, apart from dependent type systems like Idris[0] or an explicit Haskell type like Data.Range.[1]
0. https://stackoverflow.com/questions/28426191/how-to-specify-...
1. https://hackage.haskell.org/package/range-0.3.0.2/docs/Data-...
ksec
I am the only one on HN that brings up Ada because I think it deserve some credit. But then it seems there are a lot of hate towards Pascal style syntax.
nick_
If Rust is the language that finally overwhelms the resistance to memory safe languages, that's good.
I think it's also important not to centre Rust alone. In the larger picture, Rust has a combo of A) good timing, and B) the best evangelism. It stands on decades of memory safe language & runtime development, as well as the efforts of their many advocates.
jandrewrogers
This statement seems imprecise. We've had memory-safe languages for decades and they are the primary programming languages used today e.g. Java and Python. There is no meaningful resistance to them.
If you look at what unsafe languages are used for, it mostly falls into two camps (ignoring embedded). You have legacy code e.g. browsers, UNIX utilities, etc which are too expensive to rewrite except on an opportunistic basis even though they could be in principle. You have new high-performance data infrastructure e.g. database kernels, performance-engineered algorithms, etc where there are still significant performance and architectural advantages to using languages like C++ that are not negotiable, again for economic reasons.
Most of the "resistance" is economic reality impinging on wishful thinking. We still don't have a practical off-ramp for a lot of memory-unsafe code. To the extent a lot of evangelism targets these cases it isn't helpful. It is like telling people living in the American suburbs that they should sell their cars and take the bus instead.
HideousKojima
[flagged]
odyssey7
I don’t buy the economic argument favoring memory-unsafe languages. There are fast memory-safe options. Legacy codebases can eventually become more expensive to maintain than to rewrite. What is the economic cost of an Achilles’ heel when critical systems are destroyed?
There are critical systems today that are essentially Prince Rupert’s drops. Mightily impressive, but with catastrophic weaknesses in the details.
tptacek
I think it's important to keep the scope of the debate well-defined, because memory-safe languages completely stomped out memory-unsafe languages more than 20 years ago; almost all new code is written in languages that are unshowily memory safe (like Java and Python).
We're really talking about resistance to memory safety in the last redoubts of unsafety: browsers and operating systems.
olarm
> We're really talking about resistance to memory safety in the last redoubts of unsafety: browsers and operating systems.
And control systems, c++ (along with PLCs ofcourse) dominates in my experience from developing maritime software and there doesnt appear to be much inclination towards change.
zahlman
To be fair, there's a pretty clear difference between achieving memory safety with a garbage collector and run-time type information, versus achieving it through static analysis.
fiddlerwoaroof
Static analysis is worse and limits the programs you can write in annoying ways?
tuveson
> browsers and operating systems
And the VMs for the two languages that you mentioned above (edit: though to be fair to your comment, I suppose those were initially written 20+ years ago).
chubot
There’s also google, yandex, baidu, and bing, which are incredible amounts of c++ code
And probably lots of robotics, defense, and other industries
Granted, those aren’t consumer problems, but I would push back on the “last redoubts”.
We should absolutely move toward memory safe languages, but I also think there are still things to be tried and learned
npalli
> We're really talking about resistance to memory safety in the last redoubts of unsafety: browsers and operating systems.
.. and other performance critical areas like Financial applications (HFT), High Performance Computing (incl. AI/ML), embedded, IoT, Gaming/Engines, Databases, Compilers etc.. Browsers and OS are highly visible, but there is a gigantic ton of new C++ code written everyday in spite of the availability of memory safe languages.
tptacek
People keep coming up with all these examples of things still written in C/C++. Sure. So are most AAA games. But so far nothing that's been identified --- maybe excepting databases, but vulnerabilities there are still rare --- that is a meaningful component of insecurity, which is what "memory safety" addresses.
Ar-Curunir
and cryptographic code.
noelwelsh
Rust also didn't give up, whereas earlier languages like Cyclone did. This is a problem with the different incentives in research; once you've shown it works there is no funding for further development.
jekwoooooe
Go is fast and memory safe. It has some data race protections built in but doesn’t go as far as rust. This has its benefits like not having to deal with borrow checker insanity (or rust syntax for that matter)
Unlike python or java, it’s both compiled and fast
taping-memory
I'm reading the article and so far it's great.
I'm just wondering in the explanation of listing 2 you say:
> a discriminant value indicating the enum’s active variant (4 bytes)
As far as I can find, there's no guarantee for that, the only thing I can find is that it might be interpreted as an `isize` value but the compiler is permitted to use smaller values: https://doc.rust-lang.org/reference/items/enumerations.html#...
Is there any reason to say it should be 4 bytes?
It doesn't change any of the conclusions, I'm just curious
OptionOfT
Using repr(C) makes it 4 bytes.
But then again, modeling a C enum to a Rust enum is bad design. You want to use const in Rust and match against those.
But it is a bad example in general, because the author passes on a pointer of a string slice to FFI without first converting it to a CString, so it isn't null terminated.
taping-memory
> Using repr(C) makes it 4 bytes.
That makes sense, they just don't use repr(C) for the PrintResult so I didn't consider that.
> But then again, modeling a C enum to a Rust enum is bad design. You want to use const in Rust and match against those.
That makes sense but if there could be a way to safely generate code that converts to an enum safely as proposed in the article that would be good as the enum is more idiomatic.
> But it is a bad example in general, because the author passes on a pointer of a string slice to FFI without first converting it to a CString, so it isn't null terminated.
The signature for async_print in C is `async_res_t async_print(const *uint8_t, size_t)` and they are passing a pointer to a &[u8] created from a byte string literal, so I think it's correct.
b0a04gl
c/c++ you're in unsafe mode by default, unless you build guardrails yourself. rust built different: unsafe is loud, compiler flags it, tooling keeps count, you can gate it in ci. bugs don’t slip in quiet.. burden of proof shifts
djha-skin
Nope: ease of use is table stakes. Rust is not easy to use. It will never become mainstream because of this. For all its faults, C is comparatively simple.
xTachyon
(Copied from Reddit)
What they're saying is kind of true, but the example is very bad. bindgen already doesn't generate Rust enums for C enums exactly for this reason. It insteads generates const's with each variant's value, and the enum type is just an alias to its basic type (i32 or something else).
This forces you to do a match on an integer, where you have to treat the _ case (with unreachable!() probably).
I can't tell if this is the whole paper, but it seems low effort at best.
Ar-Curunir
You can just read the paper instead of making negative comments: https://patpannuto.com/pubs/schuermann2025omniglot.pdf
They are in particular careful to never state that bindgen emits the wrong code. Maybe they could have said that bindgen in fact does handle this case correctly. But Omniglot seems to be doing a lot more than bindgen, and
IshKebab
Well... he does have a point. Don't demonstrate your great tool with an issue that the existing solution doesn't actually have.
ARob109
Learning Rust ATM and using bindgen on a C header. Just looked and it generates Rust enums from C enums. I'm not sure what the default behavior of bindgen is, but it seems there is option for constifying enums
--constified-enum <REGEX> Mark any enum whose name matches REGEX as a series of constants
--constified-enum-module <REGEX> Mark any enum whose name matches REGEX as a module of constants
IMO, saying bindgen avoids the issue presented in the article is not accurate.
edit: formatting
gavinray
Where'd you find this paper link, out of curiosity?
The referenced footnote, [9], leads to: https://www.usenix.org/conference/osdi25/presentation/schuer...
null
marsven_422
[dead]
timewizard
> if it compiles, then it’s correct … or at least, will not contain use-after-free or other memory safety errors
In a language with the `unsafe` construct and effectively no automated tooling to audit the uses of it. You have no guarantee of any significance. You've just slightly changed where the security boundary _might_ lie.
> There is a great amount of software already written in other languages.
Yea. And development of those languages is on going. C++ has improved the memory safety picture quite a bit of the past decade and shows no signs of slowing down. There is no "one size fits all" solution here.
Finally, if memory safety were truly "table stakes" then we would have been using the dozens of memory safe languages that already existed. It should be blindingly obvious that /performance/ is table stakes.
zaphar
Languages with unsafe don't just change where the security boundary lies. It shrinks the size of the area that the boundary surrounds.
C++ has artificially limited how much it can improve the memory safety picture because of their quite valid dedication to backwards compatibility. This is a totally valid choice on their part but it does mean that C++ is largely out of the running for the kinds of table stakes memory safety stuff the article talks about.
There are dozens of memory safe languages that already exist: Java, Go, Python, C#, Rust, ... And a whole host of other ones I'm not going to bother listing here.
torstenvl
All of the languages you listed are proprietary languages. Most of them have a single implementation. They could disappear tomorrow. While that's unlikely, it's a possibility that some will go the way of ColdFusion, and more will fade away like Pascal.
zahlman
> Most of them have a single implementation.
None of them have a single implementation. It only took a few minutes to find all the following:
* https://en.wikipedia.org/wiki/Free_Java_implementations
* Go has gofrontend and GopherJS aside from the reference implementation
* Python has a whole slew of alternate implementations listed on the main Python web site: https://www.python.org/download/alternatives/
* C# has Mono, which actually implements the entire .NET framework
* Rust has Rust-GCC and gccrs
johnfernow
The Java language specification is open and there are multiple implementations. OpenJDK is the official open-source reference implementation, and many of the alternative implementations pull from upstream, but OpenJ9 is a different JVM implementation (though does currently use OpenJDK's class libraries to form a complete JDK.)
Before Microsoft opened-up C#, Mono was a completely independent alternative implementation.
Python has CPython (reference open source implementation), but also PyPy, MicroPython and several others.
umanwizard
What does “proprietary” mean to you?
AlotOfReading
In a language with the `unsafe` construct and effectively no automated tooling to audit the uses of it.
You can forbid using unsafe code with the lints built into rustc: https://doc.rust-lang.org/stable/nightly-rustc/rustc_lint/bu...Cargo allows you to apply rustc lints to the entire project, albeit not dependencies (currently). If you want dependencies you need something like cargo-geiger instead. If you find unsafe that way, you can report it to the rust safety dance people, who work with the community to eliminate unsafe in crates.
All of this is worlds ahead of the situation in C++.
vlovich123
OP is wrong that there's no tooling. All the C++ tooling that I'm aware of (e.g. ASAN/UBSAN/MSAN/TSAN) is still available on Rust. Additionally, it has MIRI which can check certain code constructs for defined behavior at the MIR level which, unlike sanitizers, validates that all code is sound according to language rules regardless of what would be run by generated assembly; this validation includes unsafe code which still has to follow the language rules. C/C++ doesn't have anything like that for undefined behavior by the way.
However, if I can apply a nitpicking attitude here that you're applying to their argument about the ease with which unsafe can be kept out of a complex codebase. unsafe is pretty baked into the language because there's either simply convenient constructs that the Rust compiler can't ever prove safely (e.g. doubly-linked list), can't prove safely today (e.g. various accessors like split), or is required for basic operations (e.g. allocating memory). Pretending like you can really forbid unsafe code wholesale in your dependency chain is not practical & this is ignoring soundness bugs within the compiler itself. That doesn't detract from the inherent advantage of safe by default.
AlotOfReading
I do safety critical code. I would consider banning allocation (e.g. just using Core) or avoiding certain data structures a completely feasible strategy to avoid unsafe if I wanted to exclude it from my safety model. It's what I'm already doing in C++. The difference is that in C++, I can never prove the absence of undefined behavior from any part of the codebase, even if I review every single line. Even if I could, that proof might be invalidated by a single change anywhere.
It's not easy in Rust, but it's possible.
xvedejas
Safe rust is a safe language. Yes, it is built upon unsafe rust. But I still consider Python to be a memory safe language despite it being built on C. I can still trust that my Python code doesn't contain such memory errors. Safe Rust is the same in terms of guarantees. That's all that anyone is claiming.
burnt-resistor
The main problem now is that there isn't a platform that has the tooling or infrastructure to prove, including through formal methods, that they are correct and free from bugs in the spirit of the seL4 project.
UltraSane
It is a lot like how you have to trust the core proving kernel in a theorem prover but if you do then you can trust every proof created using it.
burnt-resistor
https://github.com/CertiCoq/certicoq can prove (most of) itself.
imglorp
That's an extreme take now and maybe uncharitable. The safe parts of rust are simply no comparison to the whole c/c++ world: the tooling is eliminating vast swaths of "easy" errors. Unsafe parts might be comparable if they're calling the same libraries.
Industry is seeing quantifiable improvements, eg: https://thehackernews.com/2024/09/googles-shift-to-rust-prog...
noisem4ker
> It should be blindingly obvious that /performance/ is table stakes.
I think a big part of it is just inertia.
dwattttt
It's been a very slow learning process trying to undo the "performance at every cost" mantra.
Lisp, Algol 68, Pascal, Smalltalk, ML, all had both memory safety and type safety. Nobody uses it today. Why? Because software isn't developed by rational beings choosing the best tool for the job. It's developed by humans who are influenced by their cultural norms and environment. You can give someone a perfect programming language that produces bug-free programs, and they'll reject it because it uses curly-braces or some shit. Write all the papers you want; as long as the inmates are running the asylum, there is no safety.