Writing C for Curl

92 comments

·April 7, 2025

kpcyrd

> We count about 40% of our security vulnerabilities to date to have been the direct result of us using C instead of a memory-safe language alternative. This is however a much lower number than the 60-70% that are commonly repeated, originating from a few big companies and projects.

There has been discussion in an Arch Linux internal channel about the accuracy of these classifications. We noticed many advisories contain a "This bug is not considered a C mistake. It is not likely to have been avoided had we not been using C."-disclaimer, but was unclear what the agenda was and how "C mistake" is defined.

It was brought up because this disclaimer was also present in the CVE-2025-0665 advisory[0], which is essentially a double-free but on file descriptor level. The impact is extremely low (it's more "libcurl causing unsoundness in your process rather than can-be-exploited-into-RCE"), but it's a direct result of how C manages resources. This kind of bug can also occur in Python, but you're unlikely to find this kind of bug in Rust.

Could this bug have occurred with a programming language that isn't C? Yes. Could this bug have been avoided by using a programming language that isn't C? Also yes.

[0]: https://curl.se/docs/CVE-2025-0665.html

uecker

The question is: Could such bugs be avoided in C using the right tools and strategies. And the answer is also often: yes.

This is why a large component of the argument for switching to other languages is usually that is impossible to avoid such bugs in C even for experts. But I think this argument, while having some small amount of truth to it, also is partially deceptive. One can not simply look at number of CVEs and conclude this, one needs to compare apples to apples and then I find the reality looks differently, e.g. if a simple mitigation for the bug in C could not be used for some reason, but this reason would also prevent the use of another language in the first place, then using this as argument is misleading.

pornel

> Could such bugs be avoided in C using the right tools and strategies

"right tools and strategies" is very open-ended, almost tautological — if you didn't catch the bug, then obviously you haven't used the right tools and the right strategies! In reality, the tools and strategies have flaws and limitations that turn such problem into "yes but actually no".

Static analysis of C code has fundamental limits, so there are bugs it can't find, and there are non-trivial bugs that it can't find without also finding false positives. False positives make developers needlessly tweak code that was correct, and leads to fatigue that makes them downplay and ignore the reports. The more reliable tools require catching problems at run-time, but problems like double-free often happen only in rare code paths that are hard to test for, and fuzzers can't reach all code either.

uecker

Static analysis of arbitrary legacy code is limited. But I do not find it difficult to structure my code in a way that I can reasonable exclude most errors. The discussion of false positives in C is interesting. In some sense, 99% of what the Rust compiler would complain about would be considered false positives in C. So if you want to have safety in C, you can not approach this from this angle. But this relates to my point. If it is acceptable to structure the code in specific ways to make the Rust compiler happy, but you do not accept that you may have to write code in specific ways to avoid false positives in C, then you are already not comparing apples to apples.

kpcyrd

The blogpost claims they are already running "all the tools", can you please be more specific which one they are missing? Maybe the tool to avoid this kind of lifetime issue just happens to be rustc?

uecker

Rust is one tool which can be used to avoid life time issues. It is not the only tool. It also only works perfectly only when you exclusively limit yourself to using safe Rust and not use C libraries, unsafe Rust, or not directly use APIs that use integers (I assume in this example, Rust may have special safe wrappers, but in general the language also does not prevent this error). Resource management is something model checkers could verify in C. One could also design a safe API around it in C. Possibly GCC's analyzer could find such issues. In any case, the question is how much effort one wants to invest or not and what tradeoffs the solutions have. A small risk of missing such things may also be an entirely reasonable choice, even so Rust proponents irrationally claim otherwise. For example, curl uses C89 which is certainly not the best choice for safety. It is the best choice for portability to obscure platforms, but this requirement would also rule out Rust.

pjmlp

It would help if since lint was created in 1979, the large majority of C developers actually used the right tools and strategies.

In practice only when forced down MISRA like processes people seem to care, versus how relevant secure programming is seen in other programming language communities since 1960's.

Secure programming was part of Burroughs and Multics design, so why is the answer from a systems language community designed a decade later, and after 40+ years since the Morris worm, "we don't use right tools and strategies over here"?

uecker

I think this is probably the reason why Rust is attractive to companies. They can say, "use Rust and never use unsafe without sign-off from some senior programmer" and then can be relatively sure there is no buffer overflow. In C you would say, you need to follow these guidelines with tools X, Y, Z and then you can be relatively sure there is no buffer overflow. The problem is that this is a striking argument only when the only thing you care about are memory safety issues. As soon as you care about other forms of security / correctness, you need "guidelines with tools X, Y, Z" anyway. And when you have mixed code base, you also need this anyway.

im3w1l

For all it's many faults, even C++ fstreams is not vulnerable to double freeing (and as a partial reply to @Galanwe, they way they avoid issues is runtime checking).

kllrnohj

In C++ you can also make the FD-equivalent of std::unique_ptr, like Android does with unique_fd: https://cs.android.com/android/platform/superproject/main/+/...

It doesn't guarantee the issue never happens, like Rust would, but it does make it dramatically less likely to occur.

Also I think in general people vastly under-appreciate how severe an issue EBADF actually is. Outside of extremely specific, single-thread-only scenarios, that error is essentially the kernel telling you that heap corruption occurred, but almost nobody treats it with that level of severity

pjmlp

Already in early 1990s, with C++ARM as the first standard, there were plenty of advantages using C++ instead of plain C.

RAII, streams instead of stdio patters, compilers had collection classes for common types (string, array, ...) with bounds checking configuration,....

Galanwe

> It was brought up because this disclaimer was also present in the CVE-2025-0665 advisory[0], which is essentially a double-free but on file descriptor level

I don't see how Rust would have prevented calling close() two times with the same eventfd.

Munksgaard

The same way Rust prevents calling close() two times on a file.

technion

The standard for rust is that close() gets called automatically when the file descriptor goes out of scope. I believe you could choose to do it manually but that's unusual coding.

Munksgaard

There isn't actually any close() function in the std::fs::File: https://doc.rust-lang.org/std/fs/struct.File.html

coliveira

The problem is not as much C, but coding practices that make it seem like we're still in the 1970s. Codebases like curl use C at a very low level. But C has functions, has structures, has a lot of functionality to allow you to write at a higher level, instead of chasing pointers at each while statement. Code that handles pointers could be abstracted in the same way people will have to do in other languages.

pjmlp

Bell Labs 1970s, I advise learning about what already existed elsewhere in systems programming languages.

sgarland

Pointers are not a difficult concept.

zxilly

What about performance overheads? I think the reason a lot of people write C is that they have direct control over the generated assembly to maximise performance.

aaronmdjones

Even direct control over the generated assembly (C does not provide this; only writing assembler does, and only if you don't do things like LTO) is not sufficient.

Modern CPUs do all sorts of weird things. Assembly instructions can be executed out of order. Your conditional jump instruction can be speculatively executed before the condition's truth is known. Fetches from main memory can be reordered.

Even more wildly, copying the contents of one register to another is often a no-op. Yes, that's right; the following code:

    mov edx, eax

... does next to nothing on some modern CPUs. All it does sometimes is set an internal note to the effect of "later references to edx should read/write eax instead", until that note is cleared by some other operation.

You can write your assembler with the best of intentions as to how it should behave, only to discover that the CPU does things entirely differently. You still end up getting the same observable result out of it, but any timing and order of operation guarantees went out the window decades ago.

pjc50

> they have direct control over the generated assembly

They do not. As seen by all the "optimizer has done weird stuff" bugs.

sebstefan

The guidelines feel out of sync with the directions I've seen people push coding styles over the years

"Identifiers should be short" when I've mostly seen people decry how annoying it is to find yourself in a codebase where everything is abbreviated C-style (htons, strstr, printf, wchar_t, _wfopen, fgetws, wcslen)

There's a case for more verbosity and if you look at modern Curl code it reflects that as well, new identifiers aren't short

https://github.com/curl/curl/blob/master/lib/vquic/vquic.c

"Functions should be short" where I've mostly seen very negative feedback on codebases written following the trend of Uncle Bob's short functions. Complaints that hiding code in 10 levels of function calls isn't helpful, and that following rabbit holes is tedious even with modern editors

"Code should be narrow", "we enforce a strict 80 column maximum line length" I don't think I've seen that take lately. I remember seeing a few posts fly by about the number 80 specifically

You want to prevent dragging your eyes. For my IDE on default settings on a 1080p monitor, half of a 15" screen fits 100 characters

If you take away 20 columns to fit your text on less of the screen do you really get any benefits

What about the cascading effects on the code, like worse names, split lines, ...

In the end it's semi-interesting but we're all building sheds and these are mostly debates on what color the shed should be

tuetuopay

Everything is a balance. IMHO, the "identifiers should be short" "functions should be short" and such are knee-jerk reactions to overly long things that are common in some other languages (looking at you, Java). Like the practice of indicating the type, pointer, etc. Stuff ike `pWcharInputBuffer` and such.

There is a balance between `*p` and `inputPointerToMiddleOfBufferThatFrobnicates`.

dahauns

>Everything is a balance.

Very true, or as I like to put it: everything is a tradeoff.

Over decades of programming, I'm fairly certain my preferences for things like function/identifier length could be plotted along a damped oscillation curve. :)

dfox

> https://github.com/curl/curl/blob/master/lib/vquic/vquic.c

When somebody says "short identifiers" in relation to C, this is exactly the style meant by that, not the cryptic style of C standard library.

creatonez

> "Code should be narrow", "we enforce a strict 80 column maximum line length" I don't think I've seen that take lately. I remember seeing a few posts fly by about the number 80 specifically

To be fair, this is more doable in C than most other languages. No namespacing, no generics, etc. means you're not using as many columns.

I'm still not convinced, though. It's a crunch. Would rather just set a 120 to 160 column limit and make identifiers as descriptive as they should be. And I'd use prefix namespacing all over the place anyways -- fuzzy autocomplete can make it convenient.

GuB-42

For short identifiers, I think you missed an important detail.

> Also related: (in particular local) identifiers and names should be short.

The general idea is that the more distant the identifier is, the more descriptive is should be. Because you don't have as much context, and it is also a hint: if you see a long, descriptive name, it is more likely to be global.

And descriptive doesn't mean long. You still need to try making your descriptive names as short as possible. For example "timeSinceMidnightInSeconds" can be shortened to "secondsSinceMidnight" without loss of information: seconds are a unit of time, no need to repeat it.

stzsch

Seems similar to the linux kernel coding style: https://www.kernel.org/doc/html/latest/process/coding-style....

Almondsetat

There are 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors.

I don't thing good names will be solved soon

timhh

When I counted I got about 55% which is pretty close to the standard 2/3.

https://blog.timhutt.co.uk/curl-vulnerabilities-rust/

johnisgood

> Code should be easy to read. It should be clear. No hiding code under clever constructs, fancy macros or overloading.

I highly agree with this. I do not always want highly abstracted code, and some programming languages aiming to replace C are much more difficult to read, that said, Rust is supposed to replace C++, not C, right?

Thank you for the article!

Zambyte

I have been playing around a lot with Zig lately, and though it's still in beta, it really feels like it has the best chance at being a true C successor. While Rust feels like they started with C++ and worked on making it harder to write incorrectly, Zig feels like they started with C and worked on making it easier to write correctly.

They also have a few pillars that they call the "Zen" of Zig[0], of which three out of the first five are directly related to readability.

[0] https://ziglang.org/documentation/0.14.0/#Zen

Arch-TK

I used to think rust was like C++ but harder to write badly but I don't think it's anything like C++ now that I've spent a couple of years writing it.

Rust is its own thing, it has none of the extensive baggage of C++, and doesn't appear to be set to reach that level of baggage at any point in time soon if ever. It's a much cleaner, clearer, and easier to reason about programming language.

Zambyte

The lack of baggage is definitely a huge improvement, but I believe Zig also has this advantage over C. Rust and C++ also seem to encourage a similar style of programming (particularly newer C++), and so do Zig and C. The former encourages creating a library of types with inheritance, using syntactic sugar where applicable (ie operator overloading), and a functional style, whereas the latter limits the programmer to simple compound types without inheritance (manual, explicit dispatching), obvious syntax, and an imperative style.

Both seem to have their place in the ecosystem to me, but I'm really excited to see Zig mature.

voidUpdate

I really want to start trying to learn zig, but right now I feel like it's not quite finished enough. When it hits 1.0, I'll probably give it a more serious look

woodrowbarlow

Zig is friendly for soft-transitions because the compiler can compile C code. you can use Zig tooling for a C codebase, and then slowly add Zig code where it makes the most sense.

masklinn

> Rust is supposed to replace C++, not C, right?

Rust is intended as a systems langage. To the extent that it’s “supposed” to replace anything, it’s both.

MrMcCall

The reality is that we spend FAR more time reading code than writing it. That is why readability is far more important than clever, line saving constructs.

The key to further minimizing the mental load of reacquainting yourself with older existing code is to decide on a set of code patterns and then be fastidious in using them.

And then, if you want to want to be able to easily write a parser for your own code (without every detail in the spec), it's even more important.

And now that I have read TFA, I see he wrote:

> We have tooling that verify basic code style compliance.

His experience and dilligence has led him to the mountaintop, that being we must make ourselves mere cogs in a larger machine, self-limiting ourselves for the greater good of our future workload and production quality.

baumschubser

> The reality is that we spend FAR more time reading code than writing it. That is why readability is far more important than clever, line saving constructs.

In JS sometimes chain two or three inline-arrow-functions specifically for readability. When you read code, you often search for the needle of "the real thing" in a haystack of data formatting, API response prepping, localization, exception handling etc.

Sometimes those shorthand constructs help me to skip the not-so-relevant parts instead of mentally climbing down and up every sort and rename function.

That being said, I would not want this sentiment formalized in code guidelines :) And JS is not C except both have curly braces.

MrMcCall

> That being said, I would not want this sentiment formalized in code guidelines :)

Surely. I'm all for code formatting standards as long as they're MY code formatting standards :-)

Ideally, I'd like the IDE to format the code to the user/programmer's style on open, but save the series of tokens to the code database in a formatting-agnostic fashion.

Then we could each have our own style but still have a consistent codebase.

And, I should add that my formatting conventions have gotten more extreme and persnickety over the years, and I now put spaces on both sides of my commas, because they're a separate token and are not a part of the expression on either side of it. I did this purely for readability, but I have NEVER seen anyone do that in all my decades on the internet reading code and working on large codebases. But I really like how spacing it out separates the expression information from the structural information.

It also helps me deal with my jettisoning code color formatting, as, as useful as I've found it in the past, I don't want to deal with having to import/set all that environmental stuff in new environments. So, I just use bland vi with no intelligence, pushing those UI bells and whistles out of it into my code formatting.

And, I fully endorse whatever it takes for you to deal with JS, as I have loathed it since it appeared on the scene, but that's just me being an old-school C guy.

johnisgood

> That is why readability is far more important than clever, line saving constructs.

Yes, I agree, that is why I am put off by some supposed C replacements that are trying to be clever with their abstractions or constructs.

pjc50

Could you give an example of "clever" (bad) vs "simple" (good)?

In my experience C has a lot of simple grammar, a commonly-held simple (wrong) execution model, and a lot more complexity lurking underneath where it can't be so easily seen.

(One of my formative learning books was https://en.wikipedia.org/wiki/C_Traps_and_Pitfalls , valid in the 90s and mostly still valid today)

MrMcCall

Simplicity is essential to achieving managable complexity over time.

veltas

> So many people will now joke and say something about wide screens being available

And this is a silly point because I want to be able to put 2-3 files side-by-side, on that big monitor. Who are all these people asking for long code that means I don't get more than one file on screen at a time?

Arch-TK

It's not even just that. The reason newspapers have multiple columns rather than lengthy lines is because it's strictly easier to read shorter lines.

timhh

I don't think anyone disagrees with that, but 80 characters is clearly waaay too restrictive. I think 120 is much more reasonable.

johnisgood

I use either 2 spaces or tab for indentation for most languages, and I never go beyond 80 (actually, 79). It works well for XTerm, and most utilities that I use.

For git commits, I do not go beyond ~69 characters per line, so it looks neat when I am viewing the commit history.

120 characters may be fine if I only care about coding in VSCodium, for example, so sometimes I might go above the 80 column width when I am programming Go using VSCodium, but I try to not do that, because I still use "less" and whatnot, plus I have an old 17" monitor. I do not like wide monitors, I want to be able to look at the whole screen all at once, with wide monitors I would either have to be too far away, or move my head / neck / eyes too often.

So... my fonts are small, I limit to 80 column width, and I am quite happy with it. :P

To each their own, although I would have issues with Java code that not only requires me to have many files open, I would have to switch back and forth files, and I would have to horizontally scroll a lot.

I hope you realize what I am trying to say, if not, I will elaborate.

Arch-TK

Clearly to whom? I think it works fine in C with 8 spaces per indent level. It works fine in python and rust with 4 spaces per indent level. For some languages I think it's worth going down to 3 spaces per indent level. But I've not hit that many languages where it's worth going much past 80 characters.

dspillett

As the article states:

> The question could possibly be exactly where to draw the limit, and that’s a debate for every project to have.

It is subjective, and does not live in a vacuum because along with purely subjective preference regarding it on its own, it affects, and is affected by, other choices like naming and indentation conventions.

They like 80 in their project. Feel free to choose something else for your project.

dspillett

There are many, usually non-technical people though some devs & such too, who maximise everything then complain about how much space is wasted on the right of their fancy screen.

I have a 32" screen running at "standard" pixel pitch (matching the 24" 1080p screen I have in portrait next to it) which I sometimes use full-screen but usually have split 50/50, 33/66, 25/75, or 33/33/33, depending on what I'm doing. One of our testers doesn't understand, can't see benefit I get from the flexibility ("why not just have two monitors?" has been asked several times). It seems to actively annoy her that such a wide screen exists. If she ever saw the ultra-wide my friend uses for gaming I think she'd have a seizure.

Admittedly when sat this monitor plus the other in portrait is in total a bit wide (so the other screen is usually relegated to just being mail/chat windows that I only interact with when something pings for my attention) and a touch too tall. It is much more comfortable when I use the desk raised so I stand, which is how I work >⅔ of the time.

kwon-young

Curl is one of the very few projects I managed to contribute to with a very simple PR.

At the time, I was a bit lost with their custom testing framework, but was very imprest by the ease of contributing to one of the most successful open-source project out there.

I now understand why. It is because of their rules around testing and readability (and the friendly attitude of Daniel Stenberg) that a novice like me managed to do it.

kobzol

Great post!

I have some random guesses as to why the 40% vs 60-70% memory issues percentage:

- 180k is not that much code. The 60-70% number comes from Google and Microsoft, and they are dealing with way larger codebases. Of course, the size of the codebase in theory shouldn't affect the percentage, but I suspect in practice it does, as the larger the codebase is, the harder it is to enforce invariants and watch for all possible edge cases.

- A related aspect to that is that curl is primarily maintained by one person (you), or at most a handful of contributors. Of course many more people contribute to it, but there is a single maintainer who knows the whole codebase perfectly and can see behind all (or most) corners. For larger codebases with hundreds of people working on them, that is probably not the case.

- Curl is used by clients a lot (probably it's used more by clients than servers, for whatever definition of these words) over which you have no control and monitoring. That means that some UB or vulnerabilities that were triggered "in the wild", on the client side, might not ever be found. For Google/Microsoft, if we're talking about Chrome, Windows, web services etc., which are much more controled and monitored by their companies, I suspect that they are able to detect a larger fraction of vulnerabilities and issues than we are able to detect in curl.

- You write great code, love what you're doing and take pride in a job done well (again, if we scale this to a large codebase with hundreds of developers, it's quite hard to achieve the same level of quality and dedication there).

(sent this as a comment directly on the post, but it seems like it wasn't approved)

janoelze

This is remarkably clear writing — you sense how it was formed by thousands upon thousands of hours spent communicating, really cool.

bitwize

> how do we write C in curl to make it safe and secure for billions of installations?

"That's the neat thing -- you don't."

Curl should do what fish did: bite the bullet and rewrite the damn thing in Rust.

guappa

That would mean no longer running on a lot of devices it currently runs on. Which would mean those devices would just use extremely out of date curl written in C.

Any other brilliant idea?

dcminter

> "Wider code is harder to read. Period. "

That's stated as if it were proven, and I can believe that it has enough basis in fact that one might choose to enforce it, but I don't believe it's universally true.

I do often see code subject to a line-length linting enforcement that I think would have been clearer not broken up across multiple lines.

Personally I prefer a linter with escape hatches so that you can declare "this line exempt from such and such a rule" if you have enough reason for it and are willing to take the fight to the pull request :D

acmj

Some part of this article is opinionated. Curl may be well written but this is more likely to be the result of the overall structure than the number of characters per line. Actually I don't know whether curl is well written. Popularity doesn't always equate to code quality. I have used curl APIs before. I don't like them.

MrMcCall

All his ideas are fantastic, and are obviously the result of long experience in a seasoned and highly successful project. He is sharing techniques that simply work for large, complex codebases. Ignore them at your peril!

Specifically, though, these sections are related, in my experience:

> Avoid "bad" functions

> Buffer functions

> Parsing functions

> Monitor memory function use

These related aspects are why I tend to wrap many library functions that I use (in any language environment) with my own wrapper function, even if it's to just localize their use into one single entry/use point. That allows me to have one way that I use the function, thereby giving my code a place to not only place all best practices for its use, but to allow me to update those best practices in one single place for the entire codebase. And it is especially helpful if I want to simply rewrite the code itself to, for example, never use scanf, which I determined was a necessary strategy many, many moons ago.

Now, when a single function needs to accomodate different use cases and doing such separate kinds of logic would incur too much logical or runtime cost, a separate wrapper can be added, but if the additional wrappers can utilize the cornerstone wrapper, that is the best, if feasible. Of course, all these wrappers should be located in the same chunk of code.

For C, especially, wrapper functions also allow me to have my own naming convention over top of the standard library's terse names (without using macros, because they're to be avoided). That makes it easier for me to remember its name, thereby further reducing cognitive load.

HN

Writing C for Curl

Writing C for Curl