To be a better programmer, write little proofs in your head

105 comments

·July 15, 2025

hiAndrewQuinn

Oh, I have a relevant and surprisingly simple example: Binary search. Binary search and its variants leftmost and rightmost binary search are surprisingly hard to code correctly if you don't think about the problem in terms of loop invariants. I outlined the loop invariant approach in [1] with some example Python code that was about as clear and close to plain English at I could get.

Jon Bentley, the writer of Programming Pearls, gave the task of writing an ordinary binary search to a group of professional programmers at IBM, and found that 90% of their implementations contained bugs. The most common one seems to have been accidentally running into an infinite loop. To be fair, this was back in the day when integer overflows had to be explicitly guarded against - but even then, it's a surprising number.

[1]: https://hiandrewquinn.github.io/til-site/posts/binary-search...

qsort

To be fair you're picking an example that's extremely finicky about indices. It's probably the hardest basic algorithm to write down without errors. Up there with Hoare partition.

sedatk

I knew about the infamous Binary Search bugs in books, and I dared to write the first bug-free implementation in my book, very carefully. Still, it ended up having bugs. :) Luckily, Manning's early access program let me fix them before printing.

null

[deleted]

monkeyelite

The C++ standard library phrases binary search as a partitioning problem - finding the index where a predicate goes from false to true, which is helpful for me.

None4U

What are the odds you write a binary search that you'll use more than once instead of just running it and writing down the result?

JohnKemeny

I once needed to write binary search for a microcontroller in C (no libraries). The routine ran about every hour, with appx 4M data points.

groby_b

About 100%?

When do you write code that doesn't need to search? (Unless you hand it all to the database, in which case... sure, you're good)

fragmede

A research paper from Google in 2006 noted that "almost all" binary search (and merge sort) implementations contain a bug (and had for decades), so 90% seems impressive in the face of that.

https://research.google/blog/extra-extra-read-all-about-it-n...

hiAndrewQuinn

The bug in that paper is actually the very buffer overflow bug I was referring to. Given that Jon himself made that error in the "definitive" implementation, it seems unlikely that he would have spotted it in the 10% of implementations he considered correct. Under Google's stricter criteria it seems likely to me that not a single person got the search implementation truly correct enough.

JohnMakin

I think I stumbled across a similar concept in the more difficult post-grad classes I ended up in a long time ago. I began at some point late in my undergrad doing math tests entirely in pen. I didn't understand why, but it resulted in higher scores almost always, and much neater scratchwork, which I had attributed to the reason, but I think what was helping was something along the lines of what this post is getting at.

What was helping me was that before I wrote a single expression, I thought about it carefully in my head and where it would lead before putting pen to paper, because I didn't want to make a bunch of messy scratch out marks on it. Or, sometimes, I'd use a healthy amount of throwaway scratch paper if allowed. Once my path was fully formed in my head I'd begin writing, and it resulted in far fewer mistakes.

I don't always take this approach to writing code but often I do formulate a pretty clear picture in my head of how it is going to look and how I know it will work before I start.

wanderingmind

One thing that I feel that is discussed less is, what are the high level constraints and levers that can be set to enable a group of programmers to become a better programmers. For example, how much impact does architecture have? bug tracing, choice of language, tools. People handwave and say that one is better than another because of <reason 1>, <reason 2>, but to be able to explain how each of this choice impacts the code in a semi-rigorous way by explaining all the way of how it impacts individual code components and their interaction (as a graph or a tree) would be immensely helpful. If anyone knows of resources like that, please let us know

skydhash

The objective is always to attain some kind of synergy between the business logic and, as others have said, algorithm simplicity. The best way to go about it is to start from the chosen language/environment/library, and build out a DSL that can express the business rules. That's the basic premise of DDD, but where all this got complicated is splitting into contexts and the translation between their boundary.

I believe programmers should learn a bit about programming language theory, mostly the bits about what is a language. Then how to recognize the things behind the labels, and how they morph (either from an operation or from a translation between contexts). Then it's a matter of using the knowledge to move from the environment to a DSL that can express the business rules.

Architecture is the draft for the above, defining the starting point and a direction. And design is where you start to make decisions following the plan. For someone that have an idea of the destination, they can judge both.

hansvm

I think we're still at the stage of "this team lead makes shit happen" rather than "this strategy makes shit happen," with a lot of the nuances being fuzzy and/or wrong.

Tossing out a few observations:

1. People make mistakes. Your strategy needs to account for that (resilient runtime, heavy type-checking, convenient and often-used REPL, etc).

2. Above a certain project size, nobody remembers everything. Your strategy needs to account for that (excellent multifaceted documentation, disallow long-range interactions in your code, etc).

3. Dependencies have a vastly higher cost than you expect, even in the short term. Plan for that (vendor more things, in-house more things, allocate resources to dependency management, cut scope, etc).

I could go on. The core point is that certain properties of projects are "plainly" true to most people who have been around any length of time. I don't think we're yet at a point where we can often predict anything meaningful about some specific new technology, but a mental framework of "this succeeded/failed _because_ of {xyz}" helps tremendously in figuring out if/how that new idea will fit into your current workplace.

gregorygoc

Writing correct proofs is hard. Program verification is hard. In my opinion if you are hand weaving it there’s no benefit. Thinking about invariants and pre-post conditions is often unnecessary or greatly reduced if you write idiomatic code for the language and codebase. Check out “The Practice of Programming” by R. Pike and B. W. Kernighan. The motto is: simplicity, clarity, generality. I find it works really well in a day job.

On a slightly related note… Competitive programming is surprisingly good at teaching you the right set of techniques to make sure your code is correct. You won’t progress beyond certain stage unless you pick them up.

xg15

Have to strongly disagree here. I don't think the OP meant thinking up a complete, formal, proof. But trying to understand what kind of logical properties your code fulfills - e.g. what kind of invariants should hold - will make it a lot easier to understand what your code is doing and will remove a lot of the scare factor.

Nevermark

Yes, we could call this “maintaining plausible provability”.

Code for which there is not even a toehold for an imagined proof might be worth cordoning off from better code.

Sharlin

Types constitute this sort of a partial proof. Not enough to encode proofs of most runtime invariants (outside powerful dependent type systems) but the subset that they can encode is extremely useful.

gregorygoc

Yeah, and I’m saying if your code is idiomatic you get necessary invariants for free.

auggierose

Is idiomatic related to idiotic?

mprast

yes, what i had in mind were more proof sketches than proofs

monkeyelite

The most basic idea of a proof is an argument for why something is true. It’s not about avoiding small mistakes, it’s about getting directionally correct.

mathgradthrow

There is no substitute for writing correct programs, no matter how hard it is. If you want correct programs you have to write them correctly.

wizzwizz4

Turn your first paragraph on its head:

Appropriate abstractions (i.e., "idiomatic code for the language and codebase") make program verification easy. If you are hand-weaving an appropriately-abstracted program, there's little benefit to thinking about loop invariants and pre-post conditions, since they don't exist at that level of generality: correct proofs follow directly from correct code.

gregorygoc

No, appropriate abstractions are insufficient for my argument. For example: there’s one way to write an idiomatic loop in C and it inherits necessary invariants by construction.

I highly recommend reading the book, it explains concept of writing idiomatic code way better than I ever could.

chowells

That's a really funny example, given how many bugs have been found in C programs because idiomatic loops are wrong in the edge cases.

How do you idiomatically write a loop to iterate over signed ints from i to j (inclusive) in increasing order, given i <= j?

What does that loop do when j is INT_MAX?

pipes

Could you elaborate on those techniques from competitive programming please. Genuinely interested! :)

kevinventullo

+1 This is definitely the wall I hit with competitive programming. I logically know how to solve the problem, my code just ends up having one too many bugs that I can’t track down before time is up.

lliamander

This is similar to an intuition I've had about what it means to program "algorithmically". We often draw a distinction between "algorithms" and "business logic", but fundamentally they are the same thing. They are both plans of the steps necessary to accomplish a goal. The only difference, in my mind, is the style in which the program is written. To program "algorithmically" means to take steps to make undesirable program states impossible to represent.

- In the case of search or sort algorithms, where the primary concern is the speed of computation, undesirable states would be performing unnecessary or duplicate computations.

- In the case of encryption algorithms, undesirable states would be those that leak encrypted data.

- In the case of an order shipping and fulfillment system, an undesirable state would be marking an order as fulfilled when not all of the items have been delivered.

The more care that is taken to prevent undesirable states, the more the program takes on an algorithmic style. And the only way you can be sure that those undesirable states are impossible is to think in terms of proofs and invariants.

bluetomcat

> My thesis so far is something like "you should try to write little proofs in your head about your code." But there's actually a secret dual version of this post, which says "you should try to write your code in a form that's easy to write little proofs about."

Easier said than done. It is certainly feasible on greenfield projects where all the code is written by you (recently), and you have a complete mental model of the data layout and code dependencies. It's much harder to prove stuff this way when you call foo(), bar() and baz() across unit boundaries, when they modify global state and are written by different developers.

akavi

A part of "being a good developer" is being able to evolve systems in this direction. Real systems are messy, but you can and should be thoughtful about:

1. Progressively reducing the number of holes in your invariants

2. Building them such that there's a pit of success (engineers coming after you are aware of the invariants and "nudged" in the direction of using the pathways that maintain them). Documentation can help here, but how you structure your code also plays a part (and is in my experience the more important factor)

mprast

yeah, this is what i was trying to get at with that notion of "proof-affinity"; imo a well-structured codebase is one in which you can easily prove stuff to yourself about code you didn't necessarily write

msteffen

I think your frustration is illustrative, actually: the reason mutable global state is bad is because in order to prove that a piece of code reading it is correct, you have to know what _the entire rest of the program_ is doing.

If, instead, you make the global variable immutable, make the state variable into a function argument, or even wrap the mutable global state in some kind of helper class, then you only need to know what the callers of certain functions are doing. The visibility of those functions can be limited. Caller behavior can be further constrained with assertions inside the function. All of these (can) make it easier to prove that the reader is correct.

I think most programmers already do this, actually; they just don't think of their decisions this way.

bluetomcat

It’s a good direction to follow, but it can only get you so far. Some pieces of code do naturally evolve into a functional formalism, while others are inherently imperative. Usually, the top levels of your program (event loop) are imperative and deal with stateful “devices” like IO, the screen and storage subsystems. The “leaf” functions from the call graph can be functional, but you still can’t reason about the whole program when it is imperative at the top.

thfuran

Yes, but if you can actually drive all that out to the very top level and leave everything everything else clean, that’s a huge improvement over how many programs are structured.

jimbokun

> It's much harder to prove stuff this way when you call foo(), bar() and baz() across unit boundaries, when they modify global state and are written by different developers.

I think this reinforces the article's point.

Code like this is much more likely to contain bugs and be harder to maintain without introducing more bugs, than programs written from the outset with this goal of "provability".

monocasa

And in fact, when you look at formally proven code, 80% of the shtick is reducing complexity of mutation to something tractable. That's valuable whether or not you then actually go through the process of formally proving it.

hinkley

I just finished cave diving in a code base that's had a very old ticket to clean up the mess.

I went in with 3 tickets in mind to fix. I found half a dozen more while I was down there, and created 2 myself. I don't know if I got off easy or I was distracted and missed things.

The other project I'm working on is not dissimilar. Hey did you guys know you have a massive memory leak?

hinkley

> when they modify global state and are written by different developers.

Once cancer has metastasized, the treatment plans are more aggressive and less pleasant. The patient can still be saved in many cases, but that depends on a lot of external factors.

xg15

Even then, you can get a map of "safe" and "unsafe" locations in the codebase, I.e. you can get a mental model of which functions modify global state and which don't.

tristramb

The idea that you should design programs with proof in mind goes back to T. J. Dekker's solution to the mutual exclusion problem in 1959. The story is told by Edgser Dijkstra in EWD1303 (https://www.cs.utexas.edu/~EWD/transcriptions/EWD13xx/EWD130...). Much of Dijkstra's later work can be seen as him working out the consequences of this insight.

nayuki

I learned the foundations of theoretical computer science in university. I agree with the sentiment of this post, though it is hard to perform in practice.

In addition to pre-conditions and post-conditions, I would like to emphasize that loop invariants and structural induction are powerful techniques in CS proofs. https://en.wikipedia.org/wiki/Loop_invariant , https://en.wikipedia.org/wiki/Structural_induction

These notes from UofT's CSC236H are on-topic: https://www.cs.toronto.edu/~david/course-notes/csc236.pdf#pa...

sitkack

Those notes are great, David Liu seems while a wholesome dude https://www.cs.toronto.edu/~david/research.html

xg15

I'd also add mutability and immutability to the list of properties. Keeping as much state as possible immutable doesn't just help with multithreading, it will also greatly reduce the headache when trying to understand the possible states of a program.

jimbokun

The article does include those.

GMoromisato

Totally tangential, but I love to read post-mortems of people fixing bugs. What were the initial symptoms? What was your first theory? How did you test it? What was the resolution? Raymond Chen does this a lot and I've always enjoyed it.

I learn more from these concrete case studies than from general principles (though I agree those are important too).

One of my most recent bugs was a crash bug in a thread-pool that used garbage-collected objects (this is in C++) to manage network connections. Sometimes, during marking, one of the objects I was trying to mark would be already freed, and we crashed.

My first thought was that this was a concurrency problem. We're supposed to stop all threads (stop the world) during marking, but what if a thread was not stopped? Or what if we got an event on an IO completion port somehow during marking?

I was sure that it had to be a concurrency problem because (a) it was intermittent and (b) it frequently happened after a connection was closed. Maybe an object was getting deleted during marking?

The only thing that was weird was that the bug didn't happen under stress (I tried stress testing the system, but couldn't reproduce it). In fact, it seemed to happen most often at startup, when there weren't too many connections or threads running.

Eventually I proved to myself that all threads were paused properly during marking. And with sufficient logging, I proved that an object was not being deleted during marking, but the marking thread crashed anyway.

[Quick aside: I tried to get ChatGPT to help--o3 pro--and it kept on suggesting a concurrency problem. I could never really convince it that all threads were stopped. It always insisted on adding a lock around marking to protect it against other threads.]

The one thing I didn't consider was that maybe an object was not getting marked properly and was getting deleted even though it was still in use. I didn't consider it because the crash was in the marking code! Clearly we're marking objects!

But of course, that was the bug. Looking at the code I saw that an object was marked by a connection but not by the listener it belonged to. That meant that, as long as there was a connection active, everything worked fine. But if ever there were no connections active, and if we waited a sufficient amount of time, the object would get deleted by GC because the listener had forgotten to mark it.

Then a connection would use this stale object and on the next marking pass, it would crash.

TacticalCoder

[dead]

pclowes

The best way I have found to integrate this approach is Test Driven Development.

When done well, every test you write before you see it fail and then you write the barest amount of code that you think will make it pass is a mini-proof. Your test setup and assertions are what cover your pre/post conditions. Base cases are the invariant.

The key here is to be disciplined, write the simplest test you can, see the test fail before writing code, write the smallest amount of code possible to make the test pass. Repeat.

The next level is how cohesive or tightly coupled your tests are. Being able to make changes with minimal test breakage "blast radius" increases my confidence of a design.

GuB-42

I am not a fan of Test Driven Development, not at all.

Having your invariants and pre/post conditions correct is not enough. You also need to do the right thing. For example, you have a function that adds two durations in the form hh:mm:ss, you have mm <= 60 and ss <= 60 as an invariant, testing it is a good thing, but it won't tell you that your addition is correct. Imagine your result is always 1s too much, you can even test associativity and commutativity and it will pass. Having these correct is necessary but not sufficient.

Problem is, when you write tests first, especially tight, easy to run unit tests, you will be tempted to write code that pass the tests, not code that does the right thing. Like throwing stuff at your tests and see what sticks.

I much prefer the traditional approach of first solving the problem the best you can, which may of may not involve thinking about invariants, but always with the end result in mind. And only when you are reasonably confident with your code, you can start testing. If it fails, you will have the same temptation to just pass the test, but at least, you wrote it at least once without help from the tests.

Maybe that's just me, but when I tried the "tests first" approach, the end result got pretty bad.

JohnKemeny

You’ve missed the most important point of TDD—and indeed, of tests.

Tests do not ensure that your functions are correct; they ensure that you are alerted when their behavior changes.

Second, TDD is a way to force dogfooding: having to use the functions you’re going to write, before you write them, helps you design good interfaces.

GuB-42

> Tests do not ensure that your functions are correct; they ensure that you are alerted when their behavior changes.

I agree with that part and I am not against tests, just the idea of writing tests first.

> helps you design good interfaces

I am sure plenty of people will disagree but I think testability is overrated and leads to code that is too abstract and complicated. Writing tests first will help you write code that is testable, but everything is a compromise, and to make code more testable, you have to sacrifice something, usually in the form of adding complexity and layers of indirection. Testability is good of course, but it is a game of compromises, and for me, there are other priorities.

It makes sense at a high level though, like for public APIs. Ideally, I'd rather write both sides at the same time, as to have a real use case rather than just a test, and have both evolve together, but it is not always possible. In that case, writing the test first may be the next best thing.

pclowes

The flow here for me is if the code is doing the wrong thing I:

- Write a test that demonstrates that it is doing the wrong thing

- Watch it fail

- Change the code to do the right thing

- Ensure the test passes

And in return I get regression prevention and verified documentation (the hopefully well structured and descriptive test class) for almost free.

I don't think any amount of testing absolves the programmer from writing clear, intention-revealing code that is correct. TDD is just a tool that helps ensure the programmers understanding of the code evolves with code. There have been so many times where I write code and expect a test to fail/pass and it doesn't. This detects subtle minute drift in understanding.

saurik

A test is not a proof, and you can prove something works without ever even running it. There are also properties of a system which are impossible to test, or are so non-deterministic that you a test will only fail once every million times the code is executed. You really need to just learn to prove stuff.

The most complex piece of code I have ever written, as a relevant story, took me a month to prove to everyone that it was correct. We then sent it off to multiple external auditors, one of which who had designed tooling that would let them do abstract interpretation with recursion, to verify I hadn't made any incorrect assumptions. The auditors were confused, as the code we sent them didn't do anything at all, as it had a stupid typo in a variable name... I had never managed to figure out how to run it ;P. But... they found no other bugs!

In truth, the people whom I have met whom are, by far, the worst at this, are the people who rely on testing, as they seem to have entirely atrophied the ability to verify correctness by reading the code and modeling it in some mathematical way. They certainly have no typos in their code ;P, but because a test isn't a proof, they always make assumptions in the test suite which are never challenged.

pclowes

Interesting, could you show me a formal proof that can't be expressed in logic (ie. code) and then tested?

My thought here is that since proofs are logic and so is code you can't have a proof that can't be represented in code. Now admittedly this might look very different than typical say JUnit unit tests but it would still be a test validating logic. I am not saying every system is easily testable or deterministic but overall, all else equal, the more tested and testable a system is the better it is.

IME things that are very hard to test are often just poorly designed.

aljgz

Consider a function that gets an array of integers and a positive number, and returns the sum of the array elements modulo the number. How can we prove using tests, that this always works for all possible values?

Not discounting the value of tests: we throw a bunch of general and corner cases at the function, and they will ring the alarm if in the future any change to the function breaks any of those.

But they don't prove it's correct for all possible inputs.

layer8

Tests can generally only test particular inputs and/or particular external states and events. A proof abstracts over all possible inputs, states, and events. It proves that the program does what it is supposed to do regardless of any particular input, state, or events. Tests, on the other hand, usually aren't exhaustive, unless it's something like testing a pure function taking a single 32-bit input, in which case you can actually test its correctness for all 2^32 possible inputs (after you convinced yourself that it's truly a pure function that only depends on its input, which is itself a form of proof). But 99.99% of code that you want to be correct isn’t like that.

As an example, take a sorting procedure that sorts an arbitrarily long list of arbitrarily long strings. You can't establish just through testing that it will produce a correctly sorted output for all possible inputs, because the set of possible inputs is unbounded. An algorithmic proof, on the other hand, can establish that.

That is the crucial difference between reasoning about code versus merely testing code.

Kambing

> A test is not a proof

Actually, a test _is_ a proof! Or more specifically, a traditional test case is a narrow, specific proposition. For example, the test `length([1, 2, 3]) = 3` is a proposition about the behavior of the `length` function on a concrete input value. The proof of this proposition is _automatically generated_ by the runtime, i.e., the proof that this proposition holds is the execution of the left-hand side of the equality and observing it is identical to the right-hand side. In this sense, the runtime serves as an automated theorem prover (and is, perhaps, why test cases are the most accessible form of formal reasoning available to a programmer).

What we colloquially consider "proof" are really more abstract propositions (e.g., using first-order logic) that require reasoning beyond simple program execution. While the difference is, in some sense, academic, it is important to observe that testing and proving (in that colloquial sense) are, at their core, the same endeavor.

JohnKemeny

That is stretching the definitions of proofs.

    assert random() != 0.

QED

swat535

TDD is also great for playing around with ideas and coming up with a clean interface for your code. It also ensures that your code is testable, which leads to improved readability.

You'll know quickly where you're going wrong because if you struggle to write the test first, it's a symptom of a design issue for example.

That being said, I wouldn't use it as dogma, like everything else in CS, it should be used at the right time and in the right context.

pclowes

I agree on the Dogma aspect. Plenty of times I have abandoned it. However, I did find it very helpful to spend my first couple years in a strict Xtreme Programming (XP) shop. The rigor early on was very beneficial and its a safety net for when I feel out of my depth in an area.

I tend to go the other way, I don't use TDD when I am playing around/exploring as much as when I am more confident in the direction I want to go.

Leaving a failing test at the end of the day as a breadcrumb for me to get started on in the morning has been a favored practice of mine. Really helps get the engine running and back into flow state first thing.

The dopamine loop of Red -> Green -> Refactor also helps break through slumps in otherwise tedious features.

null

[deleted]

wrs

As an undergrad at Carnegie Mellon in the 80s, I was explicitly taught to do all of these things in one of the first-year programming courses. And it has served me very well since then.

I especially remember how learning the equivalence of recursion and induction immediately eliminated the “frustrated trial and error” approach for making recursive algorithms that work.

TheShermanTank

Took the course last year, and I started to appreciate it more while taking functional programming lmao

HN

To be a better programmer, write little proofs in your head

To be a better programmer, write little proofs in your head