Ask HN: A retrofitted C dialect?

80 comments

·February 22, 2025

Hi I'm Anqur, a senior software engineer with different backgrounds where development in C was often an important part of my work. E.g.

1) Game: A Chinese/Vietnam game with C/C++ for making server/client, Lua for scripting [1]. 2) Embedded systems: Switch/router with network stack all written in C [2]. 3) (Networked) file system: Ceph FS client, which is a kernel module. [3]

(I left some unnecessary details in links, but are true projects I used to work on.)

Recently, there's a hot topic about Rust and C in kernel and a message [4] just draws my attention, where it talks about the "Rust" experiment in kernel development:

> I'd like to understand what the goal of this Rust "experiment" is: If we want to fix existing issues with memory safety we need to do that for existing code and find ways to retrofit it.

So for many years, I keep thinking about having a new C dialect for retrofitting the problems, but of C itself.

Sometimes big systems and software (e.g. OS, browsers, databases) could be made entirely in different languages like C++, Rust, D, Zig, etc. But typically, like I slightly mentioned above, making a good filesystem client requires one to write kernel modules (i.e. to provide a VFS implementation. I do know FUSE, but I believe it's better if one could use VFS directly), it's not always feasible to switch languages.

And I still love C, for its unique "bare-bone" experience:

1) Just talk to the platform, almost all the platforms speak C. Nothing like Rust's PAL (platform-agnostic layer) is needed. 2) Just talk to other languages, C is the lingua franca (except Go needs no libc by default). Not to mention if I want WebAssembly to talk to Rust, `extern "C"` is need in Rust code. 3) Just a libc, widely available, write my own data structures carefully. Since usually one is writing some critical components of a bigger system in C, it's just okay there are not many choices of existing libraries to use. 4) I don't need an over-generalized generics functionality, use of generics is quite limited.

So unlike a few `unsafe` in a safe Rust, I want something like a few "safe" in an ambient "unsafe" C dialect. But I'm not saying "unsafe" is good or bad, I'm saying that "don't talk about unsafe vs safe", it's C itself, you wouldn't say anything is "safe" or "unsafe" in C.

Actually I'm also an expert on implementing advanced type systems, some of my works include:

1) A row-polymorphic JavaScript dialect [5]. 2) A tiny theorem prover with Lean 4 syntax in less than 1K LOC [6]. 3) A Rust dialect with reuse analysis [7].

Language features like generics, compile-time eval, trait/typeclass, bidirectional typechecking are trivial for me, I successfully implemented them above.

For the retrofitted C, these features initially come to my mind:

1) Code generation directly to C, no LLVM IR, no machine code. 2) Module, like C++20 module, to eliminate use of headers. 3) Compile-time eval, type-level computation, like `malloc(int)` is actually a thing. 4) Tactics-like metaprogramming to generate definitions, acting like type-safe macros. 5) Quantitative types [8] to track the use of resources (pointers, FDs). The typechecker tells the user how to insert `free` in all possible positions, don't do anything like RAII. 6) Limited lifetime checking, but some people tells me lifetime is not needed in such a language.

Any further insights? Shall I kickstart such project? Please I need your ideas very much.

[1]: https://vi.wikipedia.org/wiki/V%C3%B5_L%C3%A2m_Truy%E1%BB%81...

[2]: https://e.huawei.com/en/products/optical-access/ma5800

[3]: https://docs.ceph.com/en/reef/cephfs/

[4]: https://lore.kernel.org/rust-for-linux/Z7SwcnUzjZYfuJ4-@infr...

[5]: https://github.com/rowscript/rowscript

[6]: https://github.com/anqurvanillapy/TinyLean

[7]: https://github.com/SchrodingerZhu/reussir-lang

[8]: https://bentnib.org/quantitative-type-theory.html

Visit

arnsholt

In 2014 John Regehr and colleagues suggested what he called Friendly C[0], in an attempt to salvage C from UB. About bit more than a year later, he concluded that the project wasn't really feasible because people couldn't agree on the details of what Friendly C should be.[1]

In the second post, there's an interesting comment towards the end:

> Luckily there’s an easy away forward, which is to skip the step where we try to get consensus. Rather, an influential group such as the Android team could create a friendly C dialect and use it to build the C code (or at least the security-sensitive C code) in their project. My guess is that if they did a good job choosing the dialect, others would start to use it, and at some point it becomes important enough that the broader compiler community can start to help figure out how to better optimize Friendly C without breaking its guarantees, and maybe eventually the thing even gets standardized. There’s precedent for organizations providing friendly semantics; Microsoft, for example, provides stronger-than-specified semantics for volatile variables by default on platforms other than ARM.

I would argue that this has happened, but not quite in the way he expected. Google (and others) has chosen a way forward, but rather than somehow fixing C they have chosen Rust. And from what I see happening in the tech space, I think that trend is going to continue: love it or hate it, the future is most likely going to be Rust encroaching on C, with C increasinly being relegated to the "legacy" status like COBOL and Fortran. In the words of Ambassador Kosh: "The avalanche has already started. It is too late for the pebbles to vote."

0: https://blog.regehr.org/archives/1180 1: https://blog.regehr.org/archives/1287

Macha

I think the problem with "friendly C", "safe C++" proposals is they come from a place of "I want to continue using what I know in C/C++ but get some of the safety benefits. I'm willing to trade some of the safety benefits for familiarity". The problem is the friendly C/safe C++ that people picture from that is on a spectrum. On one end you have people that really just want to keep writing C++98 or C99 and see this as basically a way to keep the network effects of C/C++ by having other people write C who wouldn't. The other extreme are people who are willingly to significantly rework their codebases to this hypothetical safe C.

The people on one end of this spectrum actually wouldn't accept any of the changes to meaningfully move the needle, while the people on the other end have already moved or are moving to Rust.

Then in the middle you have a large group of people but not one that agrees on which points of compatibility they will give up for which points of safety. If someone just said "Ok, here's the standard variant, deal with it", they might adopt it... but they wouldn't be the ones invested enough to make it and the people who would make it have already moved to other languages.

musicale

Apologies for mentioning the same idea in multiple comments, but it seems relevant.

In my opinion, a good path forward is to add safety features to mainline C toolchains. An example is -fbounds-safety for clang/llvm

https://clang.llvm.org/docs/BoundsSafetyAdoptionGuide.html

pjmlp

History has already proven this with Objective-C and C++, also with TypeScript, while those languages provide stronger safety guarantees over plain old C and JavaScript, there are always those that will keep using the old tricks on the new system.

Only removing copy-paste compatibility fixes that.

awesome_dude

> Luckily there’s an easy away forward, which is to skip the step where we try to get consensus.

This is true, the Benovolant Dictator model, versus the Rule by committee model problesm.

Committees are notorius for having problems coming to a consensus, because everyone wants to pull in a different direction, often at odds with everyone else.

Benevolent dictators get things done, but it's not necessarily what people want.

And, we live in hope that they stay benevolent.

musicale

The path forward I see is improved safety features for popular C compilers and toolchains.

Ultimately it may be necessary to convince standards committees, but it seems like adding features and flags into mainline clang/llvm (and/or gcc, visual studio, ...) paves the road toward improved memory safety for C at large.

For example, if clang supported it out of the box, I could imagine OpenBSD (or macOS or other OSes that use clang) eventually compiling all (or nearly all) of its C code, including kernel, library, and userland code, in safe mode - and maybe even making it the default. In fact we already see certain safety flags and compiler/runtime features being adopted.

AlotOfReading

The problem with "safe pockets in ambient unsafety" is that C and C++ intentionally disallow this model. It doesn't matter what you do to enforce safety within the safe block, the definition of Undefined Behavior means that code elsewhere in your program can violate any guarantees you attempt to enforce. The only ways around this are with a language that doesn't transpile to C and doesn't have undefined behavior like Rust, or a compiler that will translate C safely like zig attempts to do. Note that zig still falls short here with unchecked illegal behavior and rustc has struggled with assumptions about C's undefined behavior propagating into LLVM's backend.

jjnoakes

Safe pockets in ambient unsafety does have benefits though. For example, some code has a higher likelihood of containing undefined behavior (code that manipulates pointers and offsets directly, parsing code, code that deals with complex lifetimes and interconnected graphs, etc), so converting just that code to safe code would have a high ROI.

And once you get to the point where a large chunk of code is in safe pockets, any bugs that smell of undefined behavior only require you to look at the code outside of the safe pockets, which hopefully decreases over time.

There are also studies that show that newly written code tends to have more undefined behavior due to its age, so writing new code in safe pockets has a lot of benefit there too.

gwbas1c

There are plenty of attempts at "safe C-like" languages that you can learn from:

C++ has smart pointers. I personally haven't worked with them, but you can probably get very close to "safe C" by mostly working in C++ with smart pointers. Perhaps there is a way to annotate the code (with a .editorconfig) to warn/error when using a straight pointer, except within a #pragma?

> Just talk to the platform, almost all the platforms speak C. Nothing like Rust's PAL (platform-agnostic layer) is needed. 2) Just talk to other languages, C is the lingua franca

C# / .Net tried to do that. Unfortunately, the memory model needed to enable garbage collection makes it far too opinionated to work in cases where straight C shines. (IE, it's not practical to write a kernel in C# / .Net.) The memory model is also so opinionated about how garbage collection should work that C# in WASM can't use the proposed generalized garbage collector for WASM.

Vala is a language that's inspired by C#, but transpiles to C. It uses the gobject system under the hood. (I guess gobjects are used in some linux GUIs, but I have little experience with it.) Gobjects, and thus Vala, are also opinionated about how automatic memory management should work, (In this case, they use reference counting.), but from what I remember it might be easier to drop into C in a Vala project.

Objective C is a decent object-oriented language, and IMO, nicer than C++. It allows you to call C directly without needing to write bindings; and you can even write straight C functions mixed in with Objective C. But, like C# and Vala, Objective C's memory model is also opinionated about how memory management should work. You might even be able to mix Swift and Objective C, and merely use Objective C as a way to turn C code into objects.

---

The thing is, if you were to try to retrofit a "safe C" inside of C, you have to be opinionated about how memory management should work. The value of C is that it has no opinions about how your memory management should work; this allows C to interoperate with other languages that allow access to pointers.

pjmlp

> C# / .Net tried to do that. Unfortunately, the memory model needed to enable garbage collection makes it far too opinionated to work in cases where straight C shines. (IE, it's not practical to write a kernel in C# / .Net.

It was pratical enough for Singularity and Midori.

Those projects failed due to lack of leadership support, not technical issues.

Additionally, Android and ChromeOS are what Longhorn userspace could have looked like if leadership support was there, instead of rebooting the whole approach with C++ and COM, that persists to this day in Windows desktop land, with WinRT doubling down on that approach, and failing as well, again due to leadership.

jcarrano

Gobjects are a nightmare. A poor reimplementation of C++ on top of C. You have to know what "unref" function to call and that type to cast. For all the drawbacks of C++, it would have been less bad than Gobject.

neonsunset

It's less so opinionated and more so that WASM GC spec is just bad and too rudimentary to be anywhere near enough for more sophisticated GC implementations found in JVM and .NET.

gwbas1c

It's been awhile since I skimmed the proposal. What I remember is that it was "just enough" to be compatible with Javascript; but didn't have the hooks that C# needs. (I don't remember any mentions about the JVM.)

I remember that the C# WASM team wanted callbacks for destructors and type metadata.

Personally, having spent > 20 years working in C#, destructors is a smell of a bigger problem; and really only useful for debugging resource leaks. I'd rather turn them off in the WASM apps that I'm working on.

Type metadata is another thing that I think could be handled within the C# runtime: Much like IntPtr is used to encapsulate native pointers, and it can be encapsulated in a struct for type safety when working with native code, there can be a struct type used for interacting with non-C# WASM managed objects that doesn't contain type metadata.

neonsunset

Here's the issue which gives an overview of the problems: https://github.com/WebAssembly/gc/issues/77

Further discussion can be found here: https://github.com/dotnet/runtime/issues/94420

Turning off destructors will not help even a little because the biggest pain points are support for byref pointers and insufficient degree of control over object memory layout.

bachmeier

You mentioned D, but are you familiar with D's BetterC?

https://dlang.org/spec/betterc.html

The goal with BetterC is to write D code that's part of a C program. There's no runtime, no garbage collector, or any of that. Of course you lose numerous D features, but that's kind of the point - get rid of the stuff that doesn't work as part of a C program.

pkkm

I'm a lot less experienced than you, but since you're collecting ideas, I'll give my opinion.

For me personally, the biggest improvements that could be made to C aren't about advanced type system stuff. They're things that are technically simple but backwards compatibility makes them difficult in practice. In order of importance:

1) Get rid of null-terminated strings; introduce native slice and buffer types. A slice would be basically struct { T *ptr, size_t count } and a buffer would be struct { T *ptr, size_t count, size_t capacity }, though with dedicated syntax to make them ergonomic - perhaps T ^slice and T @buffer. We'd also want buffer -> slice -> pointer decay, beginof/endof/countof/capacityof operators, and of course good handling of type qualifiers.

2) Get rid of errno in favor of consistent out-of-band error handling that would be used in the standard library and recommended for user code too. That would probably involve using the return value for a status code and writing the actual result via a pointer: int do_stuff(T *result, ...).

3) Get rid of the strict aliasing rule.

4) Get rid of various tiny sources of UB. For example, standardize realloc to be equivalent to free when called with a length of 0.

Metaprogramming-wise, my biggest wish would be for a way to enrich programs and libraries with custom compile-time checks, written in plain procedural code rather than some convoluted meta-language. These checks would be very useful for libraries that accept custom (non-printf) format strings, for example. An opt-in linear type system would be nice too.

Tool-wise, I wish there was something that could tell me definitively whether a particular run of my program executed any UB or not. The simpler types of UB, like null pointer dereferences and integer overflows, can be detected now, but I'd also like to know about any violations of aliasing and pointer provenance rules.

anqurvanillapy

Love all the ideas here.

I found it might be possible to tackle "strict aliasing" and "pointer provenance" with a type system and I would head down to it early. The approach might sound like Rust's `MaybeUninit` but I didn't think much about it.

I've already implemented procedural metaprogramming in a JS dialect of mine [1], it's also trivial to use it to implement compile-time format strings. I would improve the whole experience in this new C-like language.

Again, very very practical ideas here. Great thanks!

[1]: https://github.com/rowscript/rowscript/blob/16cb7e1/core/src...

Gibbon1

I highly agree that they need to add slice and buffer types to the standard headers. Especially true because the recently added counted_by attribute.

Definitely feel the strict aliasing rule should be opt in.

And there are a lot of small UB that can be eliminated here and there.

I'll add adding types as first class objects. Make typeof actually useful

Rochus

There are approaches with at least partly the same goals as you mentioned, e.g. Zig. Personally I have been working on my own C replacement for some time which meets many of your points (see https://github.com/micron-language/specification); but the syntax is derived from my Oberon+ language, not from C (even if I use C and C++ for decades, I don't think it's a good syntax); it has compile-time execution, inlines and generic modules (no need for macros or a preprocessor); the current version is minimal, but extensions like inheritance, type-bound procedures, Go-like interfaces or the finally clause (for a simple RAII or "deferred" replacement) are already prepared.

anqurvanillapy

> There are approaches e.g. Zig.

Yes! Zig has done a great job on many C-related stuff, e.g. they've already made it possible to cross-compile C/C++ projects with Zig toolchain years ago. But I'm still quite stupidly obsessed with source-level compatibility with C, don't know if it's good, but things like "Zig uses `0xAA` on debugging undefined memory, not C's traditional `0xCC` byte" make me feel Zig is not "bare-bone" enough to the C world.

> Micron and Oberon+ programming language.

They look absolutely cool to me! The syntax looks inspired from Lua (`end` marker) and OCaml (`of` keyword), CMIIW. The features are pretty nice too. I would look into the design of generic modules and inheritance more, since I'm not sure what a good extendability feature would look like for the C users.

Well BTW, I found there's only one following in your GitHub profile and it's Haoran Xu. Any story in here lol? He's just such a genius making a better LuaJIT, a baseline Python JIT and a better Python interepreter all happen in real life.

Rochus

> The syntax looks inspired from Lua (`end` marker) and OCaml (`of` keyword), CMIIW

Oberon+ and Micron are mostly derived from Wirth's Oberon and Pascal lineage. Lua inherited many syntax features from Modula-2 (yet another Wirth language), and also OCaml (accidentally?) shares some keywords with Pascal. If you are interested in even more Lua similarities, have a look at https://github.com/rochus-keller/Luon, which I published recently, but which compiles to LuaJIT and thus serves different use-cases than C.

> I would look into the design of generic modules

I found generic modules to be a good compromise with simplicity in mind; here is an article about some of the motivations and findings: https://oberon-lang.github.io/2021/07/17/considering-generic...

> Haoran Xu, making a better LuaJIT

You mean this project: https://github.com/luajit-remake/luajit-remake? This is a very interesting project and as it seems development continues after a break for a year.

woodrowbarlow

> source-level compatibility with C

not sure if this is exactly what you meant, but in Zig you can #include a C header and then "just" invoke the function. no special FFI syntax or typecasting (except rich enums and strings). it can produce compatible ASTs for C and Zig.

anqurvanillapy

Notable approaches to compatibility with C might be: 1) LLVM, like Rust and Zig did (Zig stopped using it in 2023), since LLVM IR is good for being compatible and optimizing. 2) Other backends like libgccjit, I mentioned this because rustc has a libgccjit backend and its author who is a libgccjit maintainer loves the simplicity of it, one could also think of it as a programmable GCC. 3) Code generation directly to C, that's what Koka does.

So I was talking about the direct C codegen approach, but there's still much of some mess like one needs to choose a C standard and knows how to verify the generated code.

viraptor

Here's a thing... There's been many of them and they all die because they don't provide enough benefit over the status quo. Cyclone https://en.wikipedia.org/wiki/Cyclone_(programming_language) is probably the most known one. There's Safe C https://www.safe-c.org/ A bit further from just "dialect" there's OOC https://ooc-lang.github.io/ and Vala https://vala.dev/

But the only thing that really took off was effort to change things at the very base level rather than patch issues: Rust, Zig, Go.

anqurvanillapy

> But the only thing that really took off was effort to change things at the very base level rather than patch issues.

Exactly, that's the most important takeaway I got from all the discussions here: I will be patching issues while people are flooding in for better general approaches.

I might shift much of my direction, but lucky that there are many "lessons" (like ooc) to learn.

ryao

Here is a sound static analyzer that can identify all memory safety bugs in C/C++ code, among other kinds of bugs:

https://www.absint.com/astree/index.htm

You can use it to produce code that is semi-formally verified to be safe, with no need for extensions. It is used in the aviation and nuclear industries. Given that it is used only by industries where reliability is so important that money is no object, I never bothered to ask them how much it costs. Few people outside of those industries knows that it exists. It is a shame that the open source alternatives only support subsets of what it supports. The computing industry is largely focused on unsound approaches that are easier to do, but do not catch all issues.

If you want extensions, here is a version of C that relies on hardware features to detect pointer dereferences to the wrong places through capabilities:

https://github.com/CTSRD-CHERI/cheri-c-programming

It requires special CHERI hardware, although the hardware does exist.

AlotOfReading

Astree is a pain in the butt. Even if it were free, I'd recommend it to very few people. It's not usable without someone (often a team) being responsible for it full time.

TrustInSoft is the higher quality option, polyspace is the more popular option, and IKOS is probably the best open source option. I've also had luck with tools from Galois Inc and the increasingly dated rv-match tool.

edwcross

Funny you mentioned TrustInSoft but not its open-source origin (which is still under development and evolving in a different direction), Frama-C. I cannot compare it to IKOS, but their usefulness depends a lot on the type of code and verification needs.

AlotOfReading

I want to like Frama-C, but I've never managed to actually use it successfully on real projects and the repeated experiences have soured me on it. Getting a recent version installed was a serious chore last time I tried, and no one else is willing to deal with ACSL to get the full value.

ryao

Tell me more.

hgs3

C is still evolving. Instead of creating a new C dialect, why not try improving C itself? You can prototype new features with Clang and submit a technical proposal to the C committee for review. Regarding "memory safety" specifically, many of the challenges folks face with RAM management are related to bounds checking so consider prototyping a slices concept [1].

[1] https://www.digitalmars.com/articles/C-biggest-mistake.html

colonial

The problem with this is that even seemingly basic, obviously desirable proposals can take years of labor and politicking to get through the committee. See JeanHeyd Meneide's valiant struggle to get an #embed preprocessor directive standardized [1] - it took five years, and I'm pretty sure the C++ equivalent (std::embed) is still in the oven.

When faced with that, it's only natural that people lean hard towards dialects and new languages. They move faster (Rust went from a standing start to 1.0 in ~five years) and offer far more freedom.

[1]: https://thephd.dev/finally-embed-in-c23

hgs3

C2Y, the next C revision, introduced "enable safe programming" into the C standard charter [1]. The C committee is eager for proposals like this.

Adding a new feature, like slices, as a Clang extension would be considerably faster than creating a new dialect or language, and it would be immediately usable by every C codebase building with Clang. Even if the feature is "slow" to be incorporated into the standard, it would still be accessible as a compiler extension in the interim.

[1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3223.pdf

1718627440

In my opinion SPLint (http://splint.org/) would be a nice approach. It is a way to specify ownership semantics, inout parameters etc., but also allows to specify arbitrary pre- and postconditions. It works by annotating whole functions, their parameters, types and variables. These are then checked by calling splint on the codebase, you can also opt out of several checks by flags or using the preprocessor.

  - nullability: /*@null@*/
  - in/out parameter (default in): /*@inout@*/, /*@out@*/
  - ownership: /*@only@*/, /*@temp@*/, /*@shared@*/, /*@refcounted@*/
  - also supports partial defined parameters
  - allows to be introduced gradually in the codebase

Example from the documentation:

  void * /*@alt char * @*/
  strcpy (/*@unique@*/ /*@out@*/ /*@returned@*/ char *s1, char *s2)
          /*@modifies *s1@*/
          /*@requires maxSet(s1) >= maxRead(s2) @*/
          /*@ensures maxRead(s1) == maxRead (s2) @*/;

My main problem was that it was annoying to add to a project, but that is only because you need to specify ownership semantic, not because of the syntax which is short and readable, and that the program is sometimes crashing and there doesn't seem to be active development.

fxtentacle

I believe what programmers actually want is clean dialect-free C with sidecar files.

It seems people pretty universally dislike type annotations and overly verbose comments, like Ruby's YARD or Java's Javadoc. Also, if your new language doesn't compile with a standard C compiler, kernel usage is probably DOA. That means you want to keep the source code pure C and store additional data in an additional file. That additional file would then contain stuff like pointer type annotations, object lifecycle and lifetime hints, compile-time eval hints, and stuff to make the macros type safe. Ideally, your tool can then use the C code and the sidecar file together to prove that the C code is bug-free and that pointers are handled correctly. That would make your language as safe as Rust to use.

The hardcore C kernel folks can then just look at the C code and be happy. And you and your users use a special IDE to modify the C code and the sidecar file simultaneously, which unlocks all the additional language features. But as soon as you hit save, the editor converts its internal representation back into plain C code. That means, technically, the sidecar file and your IDE are a fancy way of transpiling from whatever you come up with to pure C.

anqurvanillapy

I love this idea so much.

I got stuck at how to tackle "new language doesn't compile with a standard C compiler" for many times, but my solution is much worse than yours: Like LuaJIT, they left one unreadable "minilua" C file [1] to bootstrap some stuff, we could have a source-code version of the "new C" compiler, compile things twice. That sounds bad.

For languages with a very advanced type system that compiles to C, I could only think of Koka [2], which translates the "algebraic effect and handlers" code into pure C, achieving pure C generators, coroutines and async/await without the support of setjmp/setcontext. But the generated C code is unreadable, I would definitely think about how to handle the readability and debugging issue with sidecar files.

[1]: https://github.com/LuaJIT/LuaJIT/blob/v2.1/src/host/minilua....

[2]: https://koka-lang.github.io/koka/doc/book.html

vmchale

Have a look at ATS, it is memory-safe and designed for kernel development. There's a kernel and arduino examples. Fluent C interop.

No tactics metaprogramming but it'll give you a start.

anqurvanillapy

Oh I heard about it and oops just hate myself for forgetting it.

And the first sentence [1] of its pointer type introduction, exactly says everything I said here...

> [...] greatest motivation behind ATS is to make it employed to construct safe and reliable programs running in OS kernel.

Also found this interesting Reddit thread [2]. Time to bring some old gems back with good ergonomics now, it won't be that hard. Hold my beer for a while.

[1]: https://ats-lang.sourceforge.net/DOCUMENT/INT2PROGINATS/HTML...

[2]: https://www.reddit.com/r/ProgrammingLanguages/comments/uacib...