How a 20 year old bug in GTA San Andreas surfaced in Windows 11 24H2
226 comments
·April 23, 2025bombcar
aneutron
Or randomascii. A freaking legend (although he had a heart braking streak of bad events ... I wish him the best)
martinsnow
Raymond is a wizard. Read his blogs for many years and love his style and knowledge.
Discordian93
He's a total legend, yet apparently he's never met Bill Gates in person from what he said in an interview in the Dave's Garage YouTube channel a few years ago. You'd think that someone who's been that prominent for so long in the company would have been invited to a company dinner where he was present or something.
bombcar
Microsoft's a big company, and billg "stepped down" in 2000. Raymond is still working, so they overlap less than may appear.
MattSayar
Small thing but I love the effort he puts into actually coding up his examples instead of screenshots. For example: https://devblogs.microsoft.com/oldnewthing/20250414-00/?p=11...
He has many better ones but that's the latest one I've seen
RcouF1uZ4gsC
Raymond knows everything. From microcode bugs on Alpha AXP to template meta programming to UI.
transcriptase
I wonder how many times a Deloitte, PwC, KPMG, Bain, EY, McKinsey, or BCG consultant naively tried putting him on a shortlist for being “impacted” over the years because he was in the Top X of a spreadsheet sorted on Y.
gosub100
yet the company he works/worked for churns out garbage.
amenghra
IMHO, if something isn’t part of the contract, it should be randomized. Eg if iteration order of maps isn’t guaranteed in your language, then your language should go out of its way to randomize it. Otherwise, you end up with brittle code: code that works fine until it doesn’t.
bri3d
There are various compiler options like -ftrivial-auto-var-init to initialize uninitialized variables to specific (or random) values in some situations, but overall, randomizing (or zeroing) the full content of the stack in each function call would be a horrendous performance regression and isn't done for this reason.
neuroelectron
There are fast instructions (e.g., REP STOSx, AVX zero stores, dc zva) and tricks (MTE, zero pages), but no magic CPU instruction exists that transparently and efficiently randomizes or zeros the stack on function calls. You think there would be one and I bet there are on some specialized high-security systems, but I'm not sure even where you would find such a product. Telecom certainly isn't it.
mjevans
You couldn't do random, but with a predictable performance hit to memory, cache and write-line use stack addresses COULD be isolated for a program, for a library, etc.
It'd be expensive though; every context switch would require it's own stack and pushing / restoring one more register. There's GOOD reason programs don't work that way and are supposed to not rely on values outside of properly initialized (and not later clobbered) memory.
frollogaston
Randomization at this level would be too expensive. There are tools that do this for debug purposes, and your stuff runs a lot slower in that mode.
foxhill
it probably shouldn’t be a “release” thing. actually, certainly. i do wonder how many bugs would never have seen the light of day, if someone’s “set” actually turned out to be a sequence (i.e. allowed duplicate values) resulting in a debug build raising an assert.
Arainach
Debug builds are worthless for catching issues. How many people actually run them? Perhaps developers run debug builds of individual binaries they're working on when they're trying to repro a bug, but my experience at every company of every size and position in the stack (including the Windows team) is that no one does their general purpose use on a debug build.
abnercoimbre
Regarding contracts, there's an additional lesson here, quoting from the source:
> This is an interesting lesson in compatibility: even changes to the stack layout of the internal implementations can have compatibility implications if an application is bugged and unintentionally relies on a specific behavior.
I suppose this is why Linux kernel maintainers insist on never breaking user space.
codebje
I once updated a little shy of 1mloc of Perl 5.8 code to run on Perl 5.32 (ish). There were, overall, remarkably few issues that cropped up. One of these issues (that showed itself a few times) was more or less exactly this: the iteration order through a hash is not defined. It has never been defined, but in Perl 5.8 it was consistent: for the same insertion order of the same set of keys, a hash would always iterate in the same way. In a later Perl it was deliberately randomised, not just once, but on every iteration through the hash.
It turned out there a few places that had assumed a predictable - not just stable, but deterministic - hash key iteration order. Mostly this showed up as tests that failed 50% of the time, which suggested to me a rough measure of how annoying an error is to track down is inversely correlated with how often the error appears in tests.
(Other issues were mostly due to the fact that Perl 5 is all but abandoned by its former community: a few CPAN modules are just gone, some are so far out of date that they can't be coerced to still work with other modules that have been updated over time. )
tantalor
Nope. You have to remember https://www.hyrumslaw.com/
With a sufficient number of users of an API,
it does not matter what you promise in the contract:
all observable behaviors of your system
will be depended on by somebody.
If you promise randomization, then somebody will depend on that :)And then you can never remove it!
timewizard
> If you promise randomization
You don't. You say the order is undefined.
ormax3
one might argue that one of the advantages of languages like C is that you only pay for the features you choose to use, no unnecessary overhead like initializing unused variables
nayuki
You can pay for those features in debug mode or in chaos monkey mode. It's okay to continue to not pay for them in release mode. Heck, Rust has this approach when it comes to handling integer overflow - fully checked in debug mode, silent wraparound in release mode.
irundebian
In Ada you can pay for integer overflow checks (runtime) if you want to. With Ada SPARK you can prove that your code does not contain integer overflows so that you don't need runtime checks.
gzalo
I agree, this can also detect brittle tests (e.g, test methods/classes that only pass if executed in a particular order). But applying it for all data could be expensive computation-wise
plutaniano
Aren't you just creating another contract? Users might write code that depends on it being random.
Artoooooor
Maybe it would be good to change all non promised things between releases. So that such unwritten rules never become something users rely upon.
tantalor
For those users, do this instead: https://xkcd.com/221/
jandrese
> Not ignore the compilation warnings – this code most likely threw a warning in the original code that was either ignored or disabled!
What compiler error would you expect here? Maybe not checking the return value from scanf to make sure it matches the number of parameters? Otherwise this seems like a data file error that the compiler would have no clue about.
kristianp
Trying g++ version 11.4, there's no warning by default if you don't check the return value of sscanf. Even `g++ -Wall -Wextra -Wunused-result` produces no warnings for a small example.
burch45
Undefined behavior to access the uninitialized memory. A sanitizer would have flagged that.
jandrese
The compiler has no way of knowing that the memory would be undefined, not unless it somehow can verify the data file. The most I think it can do is flag the program for not checking the return value of scanf, but even that is unlikely to be true since the program probably was checking for end of file which is also in the return value. It was failing to check the number of matched parameters. This is the kind of error that is easy to miss given the semantics of scanf.
nayuki
> The compiler has no way of knowing that the memory would be undefined
Yes it would. -fsanitize=address does a bunch of instrumentation - it allocates shadow memory to keep track of what main memory is defined, and it checks every read and write address against the shadow memory. It is a combination of compile-time instrumentation and run-time checking. And yes, it is expensive, so it should be used for debugging and not the final release.
https://clang.llvm.org/docs/AddressSanitizer.html , https://learn.microsoft.com/en-us/cpp/sanitizers/asan?view=m...
andrewmcwatters
Uninitialized variables are a really common case.
andrewmcwatters
Yeah, the debugging here is great, but the actual cause is super mild.
phire
Good point. When reading, I kind of just assumed the "use of initialised memory" warning would pick this up.
But because the whole line is parsed in a single sscanf call, the compiler's static analysis is forced to assume they have now initialised. There doesn't seem to be any generic static analysis approach that can catch this bug.
Though... you could make a specialised warning just for scanf that forced you to either pass in pre-initilized values or check the return result.
maz1b
I always enjoy reading deeply technical writeups like these. I only wonder how much more rare they may or may not get in the AI era.
Cthulhu_
I don't think they will get more rare; there will always be a top % of engineers that do deep dives. I hope anyway.
But AI won't replace them, nor did the past 50+ years of software development innovation. There's millions (tens of millions?) of higher programming language developers that don't know the difference between stack or heap besides maybe some theory they half remember from school but they don't care because they don't have to think about it for their day job.
senda
i think the shift will be from craftmens to trademens in regards to general software engineers, but these are type of writes up stem of a artisan style all to its own.
nonethewiser
Compare python devs of today to fortran devs of the 60s. Something like that distance. Maybe more. But the trend isnt new.
eduardofcgo
We have been seeing this shift for a while, where "software engineers" graduate from 3 month bootcamps. Except now most likely they will not be earning 500k making crud apps.
sitzkrieg
and thats a good thing
adzm
I'm more curious in what changed with the critical section locking/unlocking implementation in this version of Windows!
mjevans
It looks like the utilized stack, or a stack protection area, increased.
asveikau
When I worked at Microsoft and I had downtime I would sometimes read the code for app compatibility shims out of pure curiosity.
Win9x video games that made bad assumptions about the stack were a theme I saw. One of the differences between win9x and NT based windows is that kernel32 (later kernelbase) is a now user mode wrapper atop ntdll, whereas in the olden days kernel32 would trap directly into the kernel. This means that kernel32 uses more user mode stack space in NT. A badly behaving app that stored data to the left of the stack pointer and called into kernel32 might see its data structures clobbered in NT and not in 9x. So there were compatibility hacks that temporarily moved the stack pointer for certain apps.
tom_
I wonder how many people think of the call stack as running left to right, most recent return first, rather than top to bottom, likewise? If you stare at enough hex dumps, it makes perfect sense.
hoten
What was the testing like for such bugs? Is it somehow automated, or is there a lengthy doc describing the manual testing steps, or are there no tests at all?
mjevans
For anyone with access issues
https://web.archive.org/web/20250423144746/https://cookieplm...
rossant
Am I the only one to be annoyed by this...?
while (this->m_fBladeAngle > 6.2831855) { this->m_fBladeAngle = this->m_fBladeAngle - 6.2831855; }
Like, "let's just write a while loop that could turn into an infinite loop coz I'm too lazy to do a division"
nemothekid
I want to assume that the GTA developers did this hack because it was faster than floating point division on the Playstation 2 or something.
But knowing they were able to they were able to blow up loading GTA5 by 5 minutes by just parsing json with sscanf, I don't have much hope.
badsectoracula
IIRC the whole parsing performance issue was because the original code was written for the SP campaign of GTA5 that only had a handful of objects to parse data for. That was barely a blip in terms of performance impact and AFAIK was written years before GTAOnline was made (where it became an issue - and even then only became an issue much after GTAOnline was first made).
Writing some simple code that works with the data you expect to have without bothering with optimizations is fine, if anything it is one of the actual cases of "premature optimization": even with profiling no real time is spent on that code, your data wont make it spend any time and you should avoid wild guesses since chances are you'll be wrong (even if in this case it could be a correct guess, it'd be like a broken clock guessing the time is always 13:37).
The actual issue with that code was that, after they reused it for GTAOnline and started becoming a performance issue after some time as they added more objects, nobody thought to try and see what is wrong.
GeoAtreides
I'm willing to bet it was was done for performance reasons, subtraction is cheaper than float point division. Probably the compiler also has some tricks to optimize this further.
There is absolutely no way this could turn into an infinite loop. It could underflow, but for that to happen angle would have to be less than the 2*pi, therefore exiting the loop.
auxiliarymoose
The article discusses how that turns into an infinite loop and causes a hang.
When you subtract a small float from a very large float, the value doesn't change. This is because the "steps" between float values increase with the size of the value (i.e. floats have coarser resolution for larger magnitudes)
To see this in action, try running the following in a JavaScript interpreter:
console.log(1_000_000_000_000_000_000 - 1);
MBCook
But that’s “impossible”. It’s an angle between 0 and 2pi. When transformed it might go over a bit so they added the check.
It will “never” become big.
So why check? It’s unnecessary.
Thus the bug.
mabster
If m_fBladeAngle is really large (>2.2e8 back of the envelope), the subtraction will have no effect, and that would be an infinite loop.
anal_reactor
Long shot, but maybe if the value is small, then this loop could be faster than division.
matsemann
If the code runs every frame, it's probably always small and does just one iteration once in a while when it wraps over the value.
hoten
for real. The author clearly never heard of fmod
carlos-menezes
Much love to Silent, who’s been improving my favorite game for over... a decade now?
nayuki
> all these findings prove that the bug is NOT an issue with Windows 11 24H2, as things like the way the stack is used by internal WinAPI functions are not contractual and they may change at any time, with no prior notice. The real issue here is the game relying on undefined behavior (uninitialized local variables), and to be honest, I’m shocked that the game didn’t hit this bug on so many OS versions, although as I pointed out earlier, it was extremely close
This sentence is the real takeaway point of the article. Undefined behavior is extremely insidious and can lull you into the belief that you were right, when you already made a mistake 1000 steps ago but it only got triggered now.
I emphasized this point in my article from years ago (but after the game was released):
> When a C or C++ program triggers undefined behavior, anything is allowed to happen in the program execution. And by anything, I really mean anything: The program can crash with an error message, it can silently corrupt data, it can morph into a colorful video game, or it can even give the right result.
> If you’re lucky, the program triggering UB will show an appropriate error message and/or crash, making you immediately aware that something went wrong. If you’re unlucky, the program will quietly mangle data, and by the time you notice the problem (via effects such as crashes or incorrect output) the root cause has been buried in the past execution history. And if you’re very unlucky, the program will do exactly what you hoped it should do, until you change some unrelated code / compiler versions / compiler vendors / operating systems / hardware platforms – and then a new bug becomes visible, and you have no clue why seemingly correct code now fails to work properly.
-- https://www.nayuki.io/page/undefined-behavior-in-c-and-cplus...
As I wrote in my article, this point really got hammered into me when a coworker showed me a patch that he made - which added a couple of innocuous, totally correct print statements to an existing C++ program - and that triggered a crash. But without his print statements, there was no crash. It turned out that there was a preexisting out-of-bounds array write, and the layout of the stack/heap somehow masked that problem before, and his unlucky prints unmasked the problem.
Okay so then, how can we do better as developers today?
0) Read, understand, and memorize what actions in C or C++ are undefined behavior. Avoid them in your code at all costs. Also obey the preconditions of any API you use, whether in the standard library, operating system, etc.
1) Compile your application in Debug mode and compare its behavior to Release mode. If they differ by anything other than speed, then you have a serious problem on your hands.
2) Compile and run with sanitizers like -fsanitize=undefined,address to catch undefined behavior at runtime.
3) Use managed languages like Java, C#, Python, etc. where you basically don't have to worry about UB in normal day-to-day code. Or use very well-designed low-level languages like Rust that are safe by default and minimize your exposure to UB when you really need to do advanced things. Whereas C and C++ have been a bonanza of UB like we have never seen before in any other language.
spookie
Other than C#, there is no reason to use those other languages for game dev. Unless the game is fairly simple, or you want to risk a fairly long project by employing a language that hasn't been proven in tge space yet (Rust). No shade at any of those languages, I don't even like C#, just being pragmatic.
wat10000
I would add: code defensively. Initialize your variables (either to a sensible value, or an outrageously wrong value) before passing pointers to them, even when you "know" that the value will be overwritten. Check for errors. Always consider what happens when things go wrong, not just when things go right. Any time you find yourself thinking, "condition X is guaranteed to hold, so I don't need to check for it" consider checking it anyway just in case you're wrong about that, or it changes later.
Leherenn
My only issue with defensive codding is that often it doesn't play nice with code coverage requirements. I've been in situations where I would like to add defensive coding just in case, but then the PR doesn't pass the coverage checks. The best is when you can ensure via th compiler (e.g. via the type system) that a case is impossible, but C++ (in my case) isn't perfect for this.
semi-extrinsic
I learned this lesson many moons ago, on a Fortran code I wrote for a university assignment. It was a basic genetic algorithm, and for some reason it was converging much more slowly than expected. So I was sprinkling some WRITEs to debug, and suddenly the code converged a hundred times faster.
gigatexal
Use a debugger folks. A 10x dev cited this story to me about the ills of not using one.
glandium
The article mentions using breakpoints, so they did use a debugger.
epolanski
I always wonder, why not write these games on top of a virtual machine like Carmack started doing in Quake, a usage he then later extended to quake 2 and 3 [1].
I'm ignorant about game development, virtual machines and system programming but from the little I understand it seems a sensible choice to make.
While there is an initial price to pay modeling 99% of the game to be implemented on a user-implemented stack seems a sensible approach to me.
assassinator42
This is a game; I don't think a debug configuration (with checks for things like this enabled) would run fast enough to be playable on contemporary hardware.
AssKoala
That's not accurate.
Generally, game console "debug" configurations aren't "true" debug like most people think of -- optimizations are still globally enabled, but the build generally has a number of debug systems enabled that naturally require the use of a devkit. Devkits, especially back then, generally had 2-3x as much memory as retail systems -- so you'd happily sacrifice framerate during feature development to have those systems enabled.
Debugging was (and still is) generally done on optimized builds and, once you know the general area of the problem, you simply disable optimizations for that file or subsystem if you can't pinpoint the issue in an optimized build.
The biggest performance hit, in general, comes from disabling optimizations in the compiler. I say "in general" because there are systems that might be used to find this kind of thing that DO make a game wholly unplayable, such as a stomp allocator. Of course, you wouldn't generally enable a stomp allocator across all your allocations unless you're desperate, so you could still have that enabled to find this kind of bug and end up with a playable game.
The more likely reason here is that no one noticed or cared. GTA:SA is 21 years old and this bug doesn't affect the Xbox or other versions.
LegionMammal978
From GP:
> (with checks for things like this enabled)
You can (and could) easily compile an optimized build with debug symbols to track down sources of issues, but catching a bug like this would likely take a dynamic checker like Valgrind or MSan, which do not allow for any optimizations if you want to avoid false negatives, and add even more overhead on top of that. (Valgrind with its full processor-level virtualization, and MSan with its shadow state on every access. But MSan didn't exist at the time, and Valgrind barely existed.)
At minimum, fine-grained stack randomization might have exposed the issue, but only if it happened to be spotted in playtests on the debug build.
null
ajross
Tools like valgrind/asan/msan would have flagged this instantly too. Just a unit test of that vehicle loader would have seen it.
Really this is more a story about poor development practice than it is an interesting bug.
avidiax
Problem with valgrind/asan/msan is that you have to start using these tools early in the development process. It can't be a "checklist" item before launch, or you'll have an insurmountable number of bugs, often with them baked in such that fixing the bug causes additional changes that introduce unrelated bugs.
Thaxll
As if tools in early 2000's were any good...
ajross
Valgrind was released in 2002 to immediate celebration. It was available and surely known to the team. All they needed to do was write a unit test that loaded and instantiated those vehicle files and run it with "valgrind" in front of the command line.
csours
Once this category of error is raised to your attention, you start to notice it more and more.
A little piece of technology made sense in the original context, but then it got moved to a different context without realizing that move broke the contract. Specifically in this case a flying boat became an airplane.
---
I recently worked a bug that feels very similar:
A linux cups printer would not print to the selected tray, instead it always requested manual feed.
Ok. Try a bunch of command line options, same issue.
Ok. Make the selection directly in the PPD (postscript printer definition) file. Same issue.
Ok! Decompile the PXL file. Wrong tray is set in pxl file... why?
Check Debug2 log level for cups - Wrong MediaPosition is being sent to ghostscript (which compiles the printer options into the print job) by a cups filter... why?
Cups filter is translating the MediaPosition from the PPD file... because the philosophy of cups is to do what the user intended. The intention inferred from MediaPosition in the PPD file (postscript printer definition) is that the MediaPosition corresponds to the PWG (Printer Working Group) MediaPosition, NOT the vendor MediaPosition (or local equivalent - in this case MediaSource).
AHA!! My PPD file had been copied from a previous generation of server, from a time when that cups filter did NOT translate the MediaPosition, so the VENDOR MediaSource numbers were used. Historically, this makes sense. The vendor tray number is set in the vendor ppd file because cups didn't know how to translate that.
Fast forward to a new execution context, and cups filters have gotten better at translating user intention, now it's translating a number that doesn't need to be translated, and silently selecting the wrong tray.
TLDR; There is no such thing as a printer command, only printer suggestions.
twic
Infamously, this is also why Ariane 501 blew up.
(a component being reused in a new context where a contract is broken, not bad CUPS drivers)
cadamsdotcom
It has always been too easy to read & write beyond the stack. This should fail, plain and simple.
Mitigations exist - ASLR, NX pages, stack-smashing protection etc. but nothing comprehensively stops reads of stale data beyond the stack.
Thought experiment for a moment. What if the hardware ensures the unused part of a stack region cannot be read or written.
There are many ways to skin this cat, here’s one based around tracking each stack’s start address A, size S, and current depth D
1. Add an instruction to inform the CPU there is a stack at address A of size S. Its depth D is initially 0.
2. Add a jump instruction which reserves N bytes on the stack at address A, growing depth D to (D+N). Maybe this can be its own “reserve” instruction so as not to need a new jump instruction.
3. Give existing return instructions stack awareness. If returning to an address inside a stack, un-reserve the bytes reserved by the most recent jump, making the new depth (D-N).
4. Fail reads or writes to the stack region beyond its current depth. In other words fail all reads and writes between A+S-D and A+S.
5. The arithmetic is reversed on architectures whose stacks grow downwards.
Downsides I can see:
It cements one calling convention. The CPU memory manager will need a lot of state per stack, of which there are many per process: address A, size S, current depth D, plus a reservation stack - ie. sizes of each frame’s stack memory. That’s a lot of bookkeeping! It’s far from zero cost. The limits of how much bookkeeping the CPU can do impose limits on how deep a stack can go and how many stacks are supported - so when there are too many stacks or one goes too deep, either the CPU needs to signal failure or engage a fallback mode and revert to behaving as CPUs do today. And of course fallback puts things back to the start. It’d therefore only mitigate situations in which an attacker cannot control the depth of the stack / a bug always happens inside the max depth the CPU can bookkeep for.
That said, stacks are ubiquitous! Hardware stack awareness opens up all kinds of new mitigations.
Why isn’t this a common idea? Has it been tried?
mjevans
Any solution I can think of uses a lot of resources. Those sort of methods are useful in some contexts, such as highly secure operations, but seem very excessive for the sort of abuse and leak encountered in this example.
LegionMammal978
This bug wasn't caused by a read beyond the current bounds of the stack, but a stale value from a prior call to the same function at the exact same location on the stack. Buffer-overflow protections like you describe wouldn't help here.
This is the kind of thing I'd expect from Raymond Chen - which is extremely high praise!
I'm glad they tracked it down even further to figure out exactly why.