The case of the critical section that let multiple threads enter a block of code

103 comments

·March 23, 2025

akoboldfrying

Loose typing strikes again.

I understand the temptation of "Let's just use an int return type, that way later on we can easily let them indicate different flavours of error if we want". But then you leave yourself open to this.

The full-BDSM approach is for every component Foo that works with callbacks like this to define its own FooResult type with no automatic type conversions to it. In C, enum suffices, though a determined idiot can just cast to override; using FooResult as defined below makes it harder to accidentally do the wrong thing:

    enum FooResultImpl { FOO_SUCCESS_IMPL, FOO_FAILURE_IMPL };

    struct FooResult {
        enum FooResultImpl USE_A_CONVERSION_FUNCTION_INSTEAD_OF_ACCESSING_THIS_DIRECTLY;
    } FOO_SUCCESS = { FOO_SUCCESS_IMPL }, FOO_FAILURE = { FOO_FAILURE_IMPL };

(Yes, it's still possible to get an erroneous value in there at initialisation time without having to type out "USE_A_CONV..." -- but this is C, you can only do so much. You can't even prevent a FooResult variable from containing uninitialised garbage...)

emn13

Is the advantage over an enum not kind of small? We're seeing bugs here because people tried to do the right thing but the tooling has absolutely no way of helping anybody to do that. Simply preventing accidental mistakes would prevent these. Adding complexity to make it harder (though never impossible) for consumers to misuse the API in a complex way seems like it's potentially going to far.

Then again, it's been years since I used this kind of C, so maybe my instincts are rusty here (no rust-pun intended!)

akoboldfrying

You're right, this may be overkill. OTOH, casting between integer types in C can (unfortunately) feel like clicking away confirmation dialog boxes -- something too readily done without full understanding ("Oh, it's always just 1 or 0, of course it will fit in the target type [so no need to think further]").

While it's annoying boilerplate for the Foo component to have to write, I don't think clients of Foo see much additional complexity. They can still write "return FOO_SUCCESS;", etc.

magicalhippo

I kinda like the way Boost did error_code[1], which got incorporated into C++11[2].

Essentially you got a generic error_code struct which has two members, an int to hold a given error value or zero if there's no error, and a reference to an error category which helps interpret the error value.

Effectively the error category is an interface, so in C terms it would be a reference to an error_category struct filled with function pointers.

There's then some machinery which allows you to compare specific error codes to generic error conditions like "file not found", abstracting away the specifics of the error code.

It's not problem free[3], but I've used this pattern in languages like Pascal and felt it worked well for me.

[1]: https://www.boost.org/doc/libs/1_82_0/libs/system/doc/html/s...

[2]: https://en.cppreference.com/w/cpp/header/system_error

[3]: https://akrzemi1.wordpress.com/2017/10/14/error-codes-some-c...

alfiedotwtf

Yes, a “Sum” type in Type Theory

magicalhippo

Well it's more like a dynamically defined sum type, no?

There's also the scaffolding around it, like the way error codes compare for equivalence[1] against error conditions in a symmetric way[2].

The result is you can easily add new a domain-specific error category, and error codes using your new category can be fed to existing code and they'll do something sensible without further modification. Your code with the new category could even be loaded at runtime[3].

Not something you can do with plain sum types, ie tagged unions in C. At least as far as I know, though I'm no expert.

[1]: https://www.boost.org/doc/libs/1_82_0/libs/system/doc/html/s...

[2]: https://www.boost.org/doc/libs/1_82_0/libs/system/doc/html/s...

[3]: though there might be dragons, see my previous reference

robmccoll

Looking at Microsoft's C code makes my eyes hurt. I don't know if it's the style (bracket placement, no new lines), naming conventions, typedeffing away pointers, or what, but it just doesn't read easily to me.

quotemstr

You get used to it after a while. The NT style has an austere beauty all its own too. Typedefing away pointers creates a uniform and regular syntax --- no reading declarations forwards then backwards then forwards again. The doc comment style encourages thoughtful architecture. Opaque handle types are common, which is good.

https://computernewb.com/~lily/files/Documents/NTDesignWorkb...

Nobody likes Hungarian though.

robmccoll

Typedeffing away pointers means I don't know when copying or passing something if it's by value or by reference.

nikanj

If it starts with LPsomething or Psomething, it's a pointer. Turns out the much-hated hungarian notation does provide value

jstimpfle

Pointer typedefs are typically useless because at the usage side you will have to declare structs on the stack and take their address anyway. So you need to be aware that a typename refers to a pointer type, and furthermore it's harder to see and really know because you have to resolve the typedef in your head instead of seeing it right there in syntax.

pavlov

The combination of all-caps type names and Hungarian notation for variables (“ppszOutStr”) makes it feel like you’re having a conversation with the vampire from the 2024 Nosferatu remake.

He’s kind of yelling slowly and kind of talking in some East European language, and yet you can kind of understand what he’s saying.

(And if I ever have to open an MFC codebase again, I’m going to be thinking of how Nosferatu springs up naked and rotting from his coffin)

tialaramex

Notably it's Systems Hungarian which makes no sense whatsoever. This notation is a way to mention the kind of a variable in languages which don't directly express that. It starts in BCPL which doesn't have types, so if boop is a boolean and blip is a counter we need to annotate the name of the variable as the compiler doesn't see any reason you shouldn't use boop as a counter and blip as a boolean, so we maybe call them bBoop and cBlip or whatever.

Now for the team writing say Excel, they have a typed language so they don't need to distinguish booleans from counters, but their language doesn't have or encourage distinct Row and Column types, those are both just integers, so "Apps Hungarian" uses the name to annotate variables with such information, clInsert is the column while rwInsert is the row, if I am reviewing code which is to inspect columns and it checks clFooC, clBar and rwBaz well why is it using a row number for a column? That warrants closer inspection.

Unfortunately, this practice was divorced from its rationale and infected teams at Microsoft who had a typed language and didn't have kind information beyond that, but felt the need to use this notation anyway, producing "Systems Hungarian" where we mark out pFoo (it's a pointer named foo), and lBar (it's a long integer named bar). This is very silly, but at this point it has infested an entire division.

canucker2016

Great description of the nuance between Systems Hungarian and Apps Hungarian.

One further nuance is that return types of APIs also affect the naming of the APIs in Apps Hungarian. Boolean return values are typically called f for flag. An API that returns a boolean would be called FApiFunction - the capital F (camel case is used) at the beginning of the API name indicates that the API returns a boolean. If you write code that stores the return value in a variable that doesn't begin with an 'f' then any code reader would see the mismatch.

FTA, one problem is that the API returns a ULONG where 0 is failure and non-zero is success.

But the developer thought that the API returned an NTSTATUS error code, where 0 (== STATUS_SUCCESS) is success and non-zero is failure. This is the opposite of the documented return value for the API in question.

In Apps Hungarian, the naming of the API would be obvious that it didn't return an NTSTATUS variable vs a ULONG. Systems Hungarian lost that tidbit somewhere. So developers must be more diligent about knowing the return values of various APIS (but assuming NTSTATUS usually works, expect in this case).

But Apps Hungarian rarely names variables with native type prefixes. Variables are more than just the native underlying type (ULONG, int). Variables have a purpose. The return type of strlen or Win32 equivalent is an int or for more modern-designed APIs, an unsigned int (since string lengths can't be negative). In Systems Hungarian, you'd name the variable ulen. But in Apps Hungarian, strlen returns an int representing the count of characters. Apps Hungarian uses 'ch' for character and 'c' to denote a count. So the variable used to store the return value from strlen would be cch. (And in Apps Hungarian, you'd actually alias strlen to CchStrlen so you'd know just looking at the API that it returns a 'cch', count of characters, and you'd need a CCH type to store the return value).

AtlasBarfed

Syntax coloring and click nav in IDEs made Hungarian totally obsolete, and ms also took it to an extreme.

I used to have different hungarian prefix conventions for local, global, argument and instance vars.

I actually still use them sometimes

RicardoLuis0

it's worth noting as well that the windows API has been a thing since 1985, _before_ C was standardized, such old versions of C were incredibly untyped

timewizard

The case of the company that writes overly verbose code and decides to change the meaning of names and constants between different libraries that actually do the same things.

Microsoft saw all the footguns available and decided to just incorporate every one they possibly could and invent new ones whenever possible. For extra hilarity they put all this in a monorepo for all the good that has /never/ apparently done for them.

raverbashing

Hungarian notation had a purpose to it, but then they went and did it wrong accidentally (there's an old Joel on Sw article about it, but I'm not linking it) - and in the example this is all over the place - not to mention C macros ooofffff

But yes the mix of that plus camel case, make things confusing

McP

https://www.joelonsoftware.com/2005/05/11/making-wrong-code-...

Why not link it?

AtlasBarfed

Did fogbugz ever hit it big? Joel's blogs are legitimate gold and almost timeless (AIpocalypse tbd).

But they just did bug tracking and other jira type stuff in MS land, right?

switch007

Every facet of Microsoft seems to lack style. Every official/affiliated blog is ugly and inconsistent. C# looks ugly to me. Windows is fugly. Their fonts are horrid. It's to be expected

jdthedisciple

Partial disagree: Yes their blogs may be ugly.

But C# is extremely beautiful, you can read and write it like a novel.

Segoe UI is also not the worst font. I give it a 7/10.

switch007

C# in Visual Studio is ugly with Windows font rendering, to me

I find the title case ugly, not a fan of semicolons, nor the new keyword. Guess I'm too used to Kotlin and Linux/Mac fonts

watt

well, C# and TypeScript is thanks to Anders Hejlsberg who does have sense of style.

userbinator

But C# is extremely beautiful

Seriously? It's basically Microsoft-flavoured Java, and has much of the same horrid verboseness and architecture-astronaut overcomplexity.

K&R, early UNIX style, BSD is what I find "extremely beautiful".

cedws

Think Steve Jobs once said something to the effect of Microsoft having no taste. I think I’d agree. Inconsistent design language, inconsistent values, inconsistent vision.

vijaybritto

Yet they've had astounding success and is the most widely used operating system. Maybe usability and ease of access are far more important than beautiful design

genewitch

I don't have any problems with the way Windows looks. I'd ask you what operating system looks better. And if the answer is macOS, I'd agree with you, I'd say yes, a painting can be pretty too, and about as useful as macOS.

userbinator

Early Windows looked good, but not anything Win8 or newer.

PartiallyTyped

I don’t see how windows has any form of prowess over macOS. I find the latter much better for my usecase.

canucker2016

If you think their fonts are ugly now, you should go back and look at the pre-Windows 3.x fonts. But you also have to realize they were saddled with CGA/EGA as their target display adapters back then.

davydm

Or rather, "the case of a buggy lazy-init function which reinitialized the critical section every time"

DamonHD

No spoilers!

davydm

The title was just a little disengenuous - the critical section got all the blame!

putzdown

I wake up every morning and thank God I am not working on or near Microsoft code. There is nothing about this code or anything about this story that is in any way sensible or pleasing. Take a simple, well-solved problem. Forgot all prior solutions. Solve it badly, with bad systems and bad ideas. Write the code in the ugliest, most opaque, most brittle and fragile manner imaginable. Now sit back and enjoy the satisfaction of getting to debug and resolve problems that never should have happened in the first place. The miracle is that Microsoft, built as it is to such a degree on this kind of trashy thinking and trashy source, still makes its annual billions. That right there is the power of incumbents.

tialaramex

I think I don't understand why they're making a critical section at all.

The end goal is to initialize something no more than once, right? But the technology they're using (wrongly, but it did exist and they were clearly aware of it) to make a critical section does initialize a thing exactly once.

I also don't understand the use of SRWLock here, or rather, I sort of do but it's a hole in Microsoft's technology stack. SRWLock is a complicated (and as it turns out, buggy, but that's not relevant here) tool, but all we want here is a mutex, so we don't actually need SRWLock except that Microsoft doesn't provide the simple mutex, so this really is what you'd have written at the time† - conjure into existence the over-complicated SRWLock and just don't use most of its functionality.

† Today you should do the same trick as a Linux futex although you spell it differently in Windows. It's also optionally smaller, which is nice for this sort of job, a futex costs 4 bytes but in Windows we can spend just one byte.

canucker2016

> The end goal is to initialize something no more than once, right? But the technology they're using (wrongly, but it did exist and they were clearly aware of it) to make a critical section does initialize a thing exactly once.

FTA the InitializeCriticalSectionOnDemand function calls RtlRunOnceExecuteOnce to perform the bookkeeping work of running a function once. RtlRunOnceExecuteOnce is passed a function pointer to do the actual grunt work of running once, in this case, initializing a critical section.

The problem is that RtlRunOnceExecuteOnce EXPECTS the function pointer to return non-zero for success and zero for failure.

InitializeCriticalSectionOnce ALWAYS returns STATUS_SUCCESS (== 0) because it CAN'T FAIL because InitializeCriticalSection can't fail. So InitializeCriticalSectionOnce is telling RtlRunOnceExecuteOnce that it failed when it actually succeeded.

NT kernel devs deal mostly with APIs that return NTSTATUS values, where 0 (== STATUS_SUCCESS) is success and non-zero is a failure.

The problem comes when another thread calls the same code. The RtlRunOnceExecuteOnce API believes that the underlying code failed before, so it tries initializing again, initializing the same critical section again (even if the crit-sect is held by another thread currently).

In goes another thread since the critical section has just been initialized and you've got multiple threads in the same 'protected' code now. Might as well not have a crit-sect.

anarazel

> Today you should do the same trick as a Linux futex although you spell it differently in Windows. It's also optionally smaller, which is nice for this sort of job, a futex costs 4 bytes but in Windows we can spend just one byte.

I don't really care about narrower futexes, what I do wish Linux had was 8 byte futexes. It's at times hard to cram enough state into 4 bytes for more complicated things.

E.g. for postgres' buffer state it'd be very useful to be able to have buffer state bits, pin count, shared lockers and the head of the lock wait queue in one futex "protectable" unit.

There are also ABA style issues that would be easier to defend against with a wider futex.

colanderman

They're calling `(*Callback)(g_myProvider, Context)` in the context of the critical section. So presumably that's also important to run exclusively.

(The tracing handler getting registered twice is just a canary.)

Wumpnot

SRWLock perf is slightly better than Window WaitOnAddress stuff, and works on older versions of windows.

tialaramex

I find this unlikely. Do have some real world evidence for this? Microbenchmarks are not at all useful for this stuff in my experience. For speed: In the uncontended case, which is what we mostly care about because if you're contended it's game over for speed, they're both a single atomic CAS, so that's no difference.

In terms of size, the pointer is bigger than you'd want - it's no pthread_mutex or the analogous Windows data structure where it's a multi cache line disaster of a data structure - but it's clearly worse than a futex or WaitOnAddress solution.

Wumpnot

A few years ago I compared them, it was not a microbenchmark, but a real application. There were a few million(almost entirely uncontended) exclusive locks being taken on startup, SRWLock was consistently faster, though the difference was not large.

valicord

> The end goal is to initialize something no more than once, right?

Took me a while to understand as well. They unregister the handler at the end of the critical section, so the requirement is not "no more then once ever", it's "no more than once at the same time".

dgellow

What makes SRWLock buggy or over complicated?

tialaramex

The idea of SRWLock is, as its name might suggest, it's a Simple Reader Writer Lock.

This makes it in theory excellent for implementing this precise feature which you will see in for example C++ (the MSVC implementation is, in fact, just an SRWLock) or Rust

So immediately it's over-complicated for a mutex because we don't need it to handle multiple readers, we're never taking its shareable reader lock, we (as you see in Raymond's example code) just rely entirely on the exclusive lock which is about 10% of its function.

SRWLock is the size of a pointer - because of course in fact it is a pointer, to a lazily allocated structure - with some flag bits squirrelled away at the bottom of the pointer.

The exciting bug is that at some point whoever was implementing this for Windows screwed up while implementing the (necessary for performance) unfair lock stealing strategy.

Brief aside about "lock stealing": If I want to take a lock, and I notice that somebody else just released that lock, I can just take it and we're done - this is unfair because there may be a queue of waiters, and I cut in ahead of them, but it has better performance. This is a simple and common feature for a mutex - fair locks are sometimes useful but definitely should not be the default.

Now, in SRWLock the lock stealing code doesn't remember whether you're trying to take the shared reader lock or the exclusive writer lock. So, it just always steals the exclusive lock - even when that's not what you wanted. As a result in code relying on SRWLock if you take the reader lock and then wait on somebody else to also get that lock (which they should be able to because it's a shared lock) this sometimes deadlocks. Oops.

neerajsi

I've worked in and around srwlock for a long time. I don't think it's all that reasonable to think you should get "shared starve exclusive" by default, which is what you're asking for. We used to demand that kind of behavior in some windows filesystem code and it's always been a design mistake.

Wumpnot

In some rare cases you can get an exclusive lock when you asked for a shared lock, most code won't care and will still work correctly, but sometimes people do weird shit that they probably should have used a different construct for, and run into this.

colonwqbang

The point of an RW lock is that it supports multiple concurrent readers. Why is it weird to expect a non exclusive lock operation to be non exclusive?

neerajsi

Thanks for the explanation. Sometimes it's worth it to keep weird shit working, but it's hard to predict which subset of weird shit will happen in practice.

hyperhello

It looks Windows is lousy with callbacks and APIs that put the burden of understanding everything on the user, and some of Windows uses 0 to mean success, and some of Windows doesn't.

muststopmyths

It actually makes sense, but only to old Windows vets.

The underlying NT (microkernel) api (as found in used-to-be-undocumented NTDLL) returns STATUS_XXX from functions. With STATUS_SUCCESS being 0. This is still consistent as far as I know.

Win32 API, technically being a "subsystem" on top of NT, exposed to the user whatever convention its users were used to. Coming from Windows 3.1, it used BOOL (int actually) where FALSE was 0 and TRUE was 1, in a (mostly) consistent manner for success/failure.

An additional wrinkle may be APIs that return a HANDLE (or address for memory allocators) or NULL, which again if you treat as ints return non-zero for success and 0 for failure.

Posix Subsystem, on the other hand, could return 0s and non-zeros as desired.

So, if you knew Windows and thought about the API you were calling it made sense if you thought in terms of the types defined (BOOL, HANDLE, etc. for Win32) instead of 0 and non-zero.

No idea if it this is still consistent over the surface of the entire exposed API set now. So much accretion of cruft over the years under the eyes of an uninterested leadership has led to a lot of fragmentation and hence deterioration in the developer experience.

spacechild1

> An additional wrinkle may be APIs that return a HANDLE (or address for memory allocators) or NULL, which again if you treat as ints return non-zero for success and 0 for failure.

Except some Win32 API functions return INVALID_HANDLE_VALUE (= -1) on failure! Of course, there is a Raymond Chen blog post: https://devblogs.microsoft.com/oldnewthing/20040302-00/?p=40.... Win32 error handling is indeed a mess...

jstimpfle

There are a lot of small details to know in order to use Windows APIs correctly. Which is stressful. On the other hand, comparing with Unix, the APIs are much much wider and are rarely if ever deprecated.

Unix APIs are a couple of syscalls and library functions. If you want to achieve more (like video, audio...) you have to resort to "optional" libraries that may have bad APIs too, but they aren't core Unix and can be replaced by something that is hopefully better.

wbl

50 years and they can't even spell create with an e at the end. That's some good backwards compat.

layer8

The reason is C had no built-in boolean type, so boolean values use an integer type just like error codes. In C, zero means false and non-zero means true, so that’s how boolean values are represented, and hence a boolean value is_success is non-zero on success and zero on failure. For error codes, on the other hand, the convention is that zero means “no error” and non-zero means some specific error. So exactly the opposite. Unix also reflects this same discrepancy, in that a process (i.e. main() or exit() in C) returning zero means success, but in Unix C programs that’s boolean false. (Not to mention that for pointer return values, null often means failure.)

A valid criticism might be that one shouldn’t use true to mean success.

userbinator

some of Windows uses 0 to mean success, and some of Windows doesn't.

In cases where there is only one successful result, 0 makes sense. Then nonzero can be a range of error codes. In cases where there is multiple successful results and only one failure, 0 means failure; malloc() is the stereotypical example of this.

colanderman

> some of Windows uses 0 to mean success, and some of Windows doesn't.

This is unfortunately true of Unixes as well.

jstimpfle

Unix APIs return -1 on error pretty consistently. The error code can then be read from the errno thread local variable. In the Linux kernel internal APIs (and probably others), -errno is returned directly (no errno mess), which is still negative.

The one "Unix" API I know that returns > 0 on error is pthread, which returns +errno directly (but still 0 on success).

Which APIs return 0 on error? I can't think of any.

colanderman

`malloc(3)`. Many of the functions in string.h.

trebligdivad

I've come across lots of C code which mixes whether it's using -errno, or errno and life all gets messy. And then there's the places where you're returning a pointer not an errno at all. (Which the kernel tries to solve with errptr I think - but there seems to be plenty of places fixing that) And then there's the way that using -errno ends up with ssize_t (signed that is) which also confuses loads of things.

canucker2016

That's the thing about development with more than one developer designing APIs.

Looks like the RtlRunOnceExecuteOnce API and structures appeared in Windows Vista.

So one would have to do some code archaeology and dig into the Windows Git repo to look at the commits for RtlRunOnceExecuteOnce and see what the typical use cases were. For some reason, returning 0 for failure made sense. Given the pervasiveness of NTSTATUS, I have no idea what made them choose a non-NTSTATUS return value. Maybe there was another callback-type API which did something similar?

If one dev thought that RtlRunOnceExecuteOnce required NTSTATUS, then I'm sure other devs will have thought the same thing so I'd expect more such bugs in the current codebase and in the future. That's why Microsoft static code analyzers (PREfix, PREfast) would flag such footguns.

xyzzy9563

Just use strong typing and mutexes. This isn't rocket science.

vijaybritto

I have a naive question here.

Could this have been avoided if they had used Rust? Or is this a bug that can happen even in Rust code too?

probably_wrong

The original bug was returning STATUS_SUCCESS to indicate that a function had succeeded without noticing that STATUS_SUCCESS is defined as 0 in a function that's expected to return a non-zero value on success. This specific error could have happened on any language - defining two different return types and using the wrong one could happen in any language.

andy12_

> defining two different return types and using the wrong one could happen in any language

This specifically is the kind of bug that is avoided with strong typing. The compiler screams at you when using the wrong return type. For example, if a callback expects a Result type, you must return a Result type, not some random int-like value whose definition of success and failure is arbitrary.

surajrmal

It can be avoided in any compiled language that has a boolean type. That includes C these days. Unfortunately this functionality predates the existence of the boolean type.

commandlinefan

I didn't even have to click through the article to know that it would be Raymond Chen ; )

ddtaylor

Microsoft take note that I read this article and everything Raymond Chen puts out under your company brand. I have zero interest in Windows as a platform and actively steer large customers away from it anytime it's discussed, since it has no value offering for most of us.

psd1

YSK that someone has got into your account and posted dumb shit. Change your password.

ddtaylor

https://news.ycombinator.com/newsguidelines.html

psd1

Should I take that to mean that you've reflected, and that you won't, in future, be derailing threads to brag or grind your axe?

If so, yw

If not, bye

HN

The case of the critical section that let multiple threads enter a block of code

The case of the critical section that let multiple threads enter a block of code