Reflections on Sudoku, or the Impossibility of Systematizing Thought

54 comments

·June 8, 2025

Visit

mbb70

I'm reminded of Rich Hickey's Hammock Driven Development talk, whose thesis for design is basically:

1. Think about the problem and write it all down

2. Research what other people have done to solve this problem

3. Think about it some more and write all that down too

4. Sleep on it

I think the conclusion is the same: Good design does not naturally arise from good programming, and background/domain knowledge is essential

LgWoodenBadger

That’s pretty much just Feynman’s algorithm but with more steps:

1. Write down the problem.

2. Think real hard.

3. Write down the solution.

DowsingSpoon

But Uncle Bob said that if I follow his TDD process by rote then my code will automatically attain good design! /s

JonChesterfield

This was an interesting read but the source article is fascinating https://explaining.software/archive/the-sudoku-affair/

TDD as an incremental search from a simple start point towards a solutions. Put in those terms, it'll work when the path from start to solution is smoothly differentiable.

reedlaw

I had a similar reaction to Jeffries' Gilded Rose solution in Ruby [1]. After 13 blog posts he ended up with something [2] a lot less elegant than Victor Shepelev who also used TDD but came up with a one-shot solution [3]. Just like Norvig with his Sudoku solver, Shepelev codifies the rules as a set of relationships. He writes:

  When rereading the requirements, we might notice that all conditions can be described as a simple dependency (name, sell_in) => change of quality.

  The most natural representation of this in the modern Ruby would be pattern matching.

1. https://ronjeffries.com/articles/020-01ff/gilded-rose-1/

2. https://github.com/RonJeffries/gold-1/blob/master/wrappers.r...

3. https://zverok.substack.com/i/149071314/the-implementation

PaulHoule

The smart way to write a Suduoku solver these days is to formulate the problem for an SMT solver, which is specifically intended to solve that kind of problem. The SMT solver already has tests, so you don't need to write tests.

When I wrote my first Sudoku solver I started out with a solver that solved easy problems where, at each step, there was some square with only one possible solution. That didn't work for harder problems where you have to guess. Eventually I realized that you could just pick any square, try all the numbers, and recursively solve the remaining problem. This works no matter which square you pick but it is faster if you start with the square that has the minimum number of choices available.

If you write a solver haphazardly like that you're very likely to wind up with two or more copies of the board because you're not sure what kind of queries you'll need to do. If you understand the problem, you won't.

The role of the tests is complex here. In principle they could help you refactor your data structures, in practice the sheer bulk of them could reify the data structures you have and add to the burden of refactoring them. A code-golfed version of the solver could be smaller than the tests.

Writing a chess program I found testing was a challenge. Move generators are easy to test, evaluation functions are easy to test. A decent alpha-beta search with transposition tables, killer heuristic and such is not so easy to test in terms of unit tests. Before I'd have the program play anyone I'd run a set of integration tests based on

https://www.chessprogramming.org/Bratko-Kopec_Test

I think of a system I inherited that had a part that was prone to race conditions, part of vanquishing the race conditions was writing a "super hammer" test that would run a few 1000 threads for about 40 seconds. Maybe it still has race conditions, but they won't be triggered often.

Long-running tests like that aren't really unit tests but for some problems they're the right tool. In general though long-running tests are a big problem because slow builds are a problem.

Someone

> but it is faster if you start with the square that has the minimum number of choices available

Alternatively, find the digit that, in its row/column/square/whatever, has the minimum number of choices available, and try each of them.

Ideally, combine the two and use Knuth’s dancing links algorithm (https://en.wikipedia.org/wiki/Dancing_Links). It, at every step, picks the one of those two approaches that has the lower number of possibilities, typically getting a lower branching factor, without spending much more time at each step.

Symmetry

In Sudoku I usually find it's easier to just keep track of guesses on the stack, by `solve` recursively when you make a guess. That leads to nice memory re-use characteristics and without the overhead of link pairs that might end up blowing out your level 1 data cache.

But you certainly want to be guessing on the most constrained spot you can.

Also, bitfields are a nice compact way of representing the possible values of a space and your cpu can and or or all the possible values in parallel, plus determine the number of possibilities with a modern CPU's popcount instruction.

SideburnsOfDoom

I have only once written a Suduoku solver. It brute-forced the whole problem space in a second or 2.

This works because it's very easy to prune invalid branches according to the rules of the game.

It just tried putting "1" in the top-left most empty square and checking if the configuration is valid. If not, try "2". recurse until you find a branch that completes.

Is this the smart way? IDK, but I was quite pleased with it. I found it more elegant than e.g. "find some square with only one possible solution"

irchans

I also wrote a brute force, kinda dumb, depth-first Suduoku solver. I think it took me a few hours in C or C++. At every node in the search tree, I just checked to make sure no constraints were violated. If I remember correctly, it would run in less than 30 seconds and it would always find a solution if a solution existed. (It might have been a lot less than 30 seconds. I only remember thinking, "That was quick".)

rjpower9000

> incremental search from a simple start point towards a solutions

Good point. More broadly than just TDD, we might characterize an "easy" problem as one where the surface from conception to completion is broadly visible and there's a logical set of steps (i.e. "differentiable"). Or maybe "easy" is I can see the full set of steps immediately, and "tractable" is that I've got a good guess.

But the set of steps you can take is highly dependent on your knowledge and skillset. For example, if I know how about completing the square, then deriving the quadratic equation is a basic algebra exercise. If I don't, then I might struggle for a long time before I manage to discover that key insight.

I'm sure there are coding methodologies which can lead to better or worse results, but they augment knowledge or skill, they don't replace it.

schoen

Isn't there a less-conceptual (but still conceptual) problem that correctness of software is commonly abrupt rather than continuous? You don't get a series of almost-right programs gradually approximating the right program, you have a correct program and variations on it may fail completely.

Of course, whether this is literally true depends on what sort of algorithmic problem you're approaching.

But there must be many real-world problems in which the very-nearly-correct program produces completely wrong behavior that doesn't resemble the correct behavior at all. In those circumstances, you couldn't expect to find the correct program empirically through iterative improvements.

Edit: Maybe the incremental test-driven approach would work in cases where you have an externally given specification or design that already breaks the algorithmic part up into smaller, easier, and more verifiable pieces.

bubblyworld

Obvious counters aside (like syntax issues or whatever) I have almost the opposite intuition. Most of my programs start out as partial solutions to a problem I don't fully understand, and it is only through interaction with the environment (users, or other machines sometimes) that the edge-cases and incorrect assumptions become clear. These programs have a lifecycle of refinement to deployment to analysis to refinement and repeat. At each step they are workable or almost-so, and over time the solutions start to map the domain more correctly. Sometimes (like in business) the domain is evolving simultaneously!

(can give examples if anyone's interested but this is getting long already)

I imagine this wouldn't work so well for hard algorithmic stuff where there are mathematical properties you need to be aware of and maintain. But I find most problems I solve are more organic - people are quite resilient to fuzzy boundaries, so people-facing stuff tends to have that property too. There's a large fuzzy space of "workable solutions", so to speak, and navigating that space is kind of inevitable if you want a quality solution.

Perhaps I'm just not intelligent enough to one-shot that kind of stuff =P

MoreQARespect

>I imagine this wouldn't work so well for hard algorithmic stuff where there are mathematical properties you need to be aware of and maintain

Mathematical properties are often even more ideal candidates for being encoded into either types or property tests.

Business oriented code is usually where most people see TDD (how it is normally taught) fall down - where you need "some kind of dashboard with x, y and z" but the exact details arent precisely nailed down. However, if you can do it right these scenarios work extremely well with snapshot test driven development. Most people just dont do it or dont know how.

bubblyworld

Okay interesting, I've never heard of snapshot testing. I'll have to play with it some time.

I agree that mathematical problems are much easier to test, but I think only once you know the mathematics. Like I think it's possible that TDD fell flat for the sudoku solver because the dude just didn't know what properties he wanted. In that situation writing tests is like casting bones.

But I'm not convinced one way or the other... for me tests have always been most useful for regression and basic quality checks. Which is very useful! Means you never (hopefully anyway) take a step backwards as you evolve a program.

Noumenon72

Aren't snapshot tests regression tests? "Snapshot test driven development" to me implies that you would generate the snapshot you want (somehow) and write code until the output matched the snapshot.

omnibrain

I always use the following analogy:

I f your customer orders a feature, you implement all the code, just the button to call the feature is missing, then you delivered nothing.

If you just add the button to the program, but implement nothing else, you delivered the feature. It's just still buggy.

aleph_minus_one

> Isn't there a less-conceptual (but still conceptual) problem that correctness of software is commonly abrupt rather than continuous? You don't get a series of almost-right programs gradually approximating the right program, you have a correct program and variations on it may fail completely.

I consider it to be plausible that such a topology could exist (at least for many situations). The problem rather is that such a topology would likely behave very different from users' expectations.

AndrewDucker

Yes. The difference between "A program that does what you want" and "A program that crashes on startup" can be one character.

pixl97

Ya but the real fun ones are the "a program does what you want 99.993% of the time"

layer8

Working large systems overwhelmingly started out as working small systems, with working systems all in-between.

This is not an endorsement of TDD, but shows that there is a correctness path from small to large, without usually needing to take large leaps in-between, and taking such a path tends to be the most successful strategy.

rjpower9000

I was fascinated by the "Sudoku Affair", found myself speculating on the internal mindset of TDD advocates, and ended up with an unsatisfying conclusion that you can't systematize thought. Not my best writing but thought I'd share nonetheless.

gnat

I think there’s an analogy in the dumb management dream that there’s a process by which you can replace these expensive skilled workers with cheap unskilled idiots and still get great results.

Another variation: the magic process that lets you get great results from people who don’t care.

Excellence doesn’t come solely from process. The benchmarks of great process with unskilled workers who don’t care is fast food, and this product is consistently mediocre. Consistency is good but consistent mediocrity is an unworthy ambition in many fields.

NASA is massively process driven but they get people to space and back, and the people in their process are highly skilled and care deeply.

Larry Wall, inventor of Perl, used to say (and probably still does) that complexity has to go somewhere. If you’re solving a complex problem then either it’s a simple program with complex tools, or a complex program with simple tools.

That’s really stuck with me. The art of library, framework, language, API design is to provide a set of tools — if you make simple tools then complex problems become complex programs. And if you offer complex tools, complex problems can have simple solutions. And people will gripe at you for all the punctuation for decades. :)

But the complexity exists and can’t be wished away. Which is kinda your point too. Thanks for writing!

WJW

> Consistency is good but consistent mediocrity is an unworthy ambition in many fields.

Sorry to take only a small part of an otherwise great comment but is this actually true? It seems to me that there is a great many fields in which consistency is more important than excellence, especially if the striving for excellence produces great misses as well sometimes. In a well-designed system with some allowed tolerance, as long as it's good enough you are fine. Take the electricity grid for example: There's no prizes for maintaining the frequency to within a nano-Hertz of the spec. There are very large fines for being outside the spec (+/-0.050 Hertz for the EU grid). Being consistently within spec is much more valuable than occasionally performing much better than the spec.

It is only in extreme winner-takes-all fields like sports, spacefaring and entrepreneuring that being the absolute best is what you want. In most other fields being consistently decent beats out varying excellence. I definitely wouldn't want my dentist to take a risky moonshot in pursuit of excellence, for example.

gnat

"I definitely wouldn't want my dentist to take a risky moonshot in pursuit of excellence". Excellent comment.

I was aware as I was writing it that consistent mediocrity is indeed a profitable target. Perhaps we're quibbling over "mediocrity"? Staying within the spec seems enough of the challenge for a grid, I'm not sure that I'd define excellence as narrower and narrower variation around the spec there.

I was making a moral judgement in "unworthy". I like cars to come off the factory floor consistently good. I own a Tesla, this statement has high salience for me. That seems like a challenge. I wouldn't respect the Lada factory for consistently turning out cars that break down or fall apart, just as I don't respect McDonalds for consistently delivering underwhelming food. I acknowledge the consistency, I acknowledge it's profitable, but they're not achieving consistent greatness.

NitpickLawyer

> there’s a process by which you can replace these expensive skilled workers with cheap unskilled idiots and still get great results. > Another variation: the magic process that lets you get great results from people who don’t care.

Isn't that pretty much what the US military does? They take in "dumb" teens + docs + processes and get whatever it is they need out of this? It's also cheap (compared w/ industry). And by the end of the exercise the "dumb" teens become skilled in their own niche, some of them highly skilled (see nuclear specialists, ATCs, pilots, navigators, managers, procurement, etc.)

The military processes are at the base of lots of organisational stuff we have in software dev and business as well (agile, scrum, ooda, etc)

WJW

Where do you get the idea that this is cheap? Militaries are somewhat famous for being incredible money sinks. They need incredibly specific skills that are not really taught in the civilian world. This means the military needs to train everyone at its own cost. So while the input might be "cheap, unskilled idiots", there is then a significant expense to turn them into expensive skilled non-idiots before they are ready for duty.

As someone who worked inside the military for 14 years, it is also quite a stretch to say they consistently get "great" results tbh.

gnat

I feel like you're saying that education (as practiced by the US military) is a somewhat reliable process for taking unskilled people and making them skilled. I agree.

The US Military have admissions criteria. They bounce people out in Basic Training and in every other part of training. Not everyone gets to be an ATC, a pilot, or a nuclear specialist. Air Traffic Control turns out not to be a process you can put an unskilled person into. It requires training and careful integration, and once you have a skilled person with many hours invested in them, perhaps you can let them direct traffic.

Training isn't a magic process that takes unskilled people who don't care and delivers great results every time. The equivalent in our world would be "I'll hire cheap people who don't know how to program, put them through a Bootcamp process, and then I'll have great programmers." That didn't work.

I am trying to say that: every version of programming where there's been The Process You Just Have To Follow (from Jackson Structured Design) has failed significantly and hasn't been a substitute for hiring smart people who care. If someone came to me with a business plan that was "hire mediocre people who don't care, and we'll achieve great results because of My Process", I'd be veeeeery skeptical indeed.

aleph_minus_one

> The military processes are at the base of lots of organisational stuff we have in software dev and business as well (agile, scrum [...])

The Agile Manifesto was exactly the counter-manifesto to this, and thus any methodology that calls itself agile (e.g. Scrum) is:

Agile Manifesto

> https://agilemanifesto.org/

Principles behind the Agile Manifesto

> https://agilemanifesto.org/principles.html

Yossarrian22

I'd be somewhat more charitable and say that it's an attempt to break down a big, complicated problem into chunks that are more easily digestible. The problem I think Jeffries ran into is that Sudoku and the search space is essentially atomic, where breaking it down further isn't helpful, while seeming like it's simple to break it down to row, column, and box and work from there.

PaulHoule

There's the principle "never let them see you sweat" and if you're trying to convince people of an idea like TDD you never want to be seen floundering. You don't publish anything about your sudoku solver until you've succeeded at it. Otherwise you're just proving "TDD sux" which, for all "X sux", TDD sux more than X.

awanderingmind

This is a rare type of article - a concrete analysis of different approaches to programming (that are arguably themselves reflections of different cognitive styles), that outlines the shortcomings of one approach in a specific domain, without generalising too much.

taeric

You see similar in other arenas, too. And sometimes, as annoying as it is, popular techniques really do have better success than not. Even more annoying, unpopular techniques can often have better success rate than we care to acknowledge.

The examples in my mind are: outlining, task breakdown, object modeling, and rote repetition.

prmph

And this is exactly thinking AI is going to make human thought and skill redundant is crazy. LLMs can never get to a point where you whip them up to solve any general intellectual challenge. Even creating CRUD apps (that are well-behaved, secure, performant, scalable, and maintainable) can't really be totally systematized.

fydorm

Wow, I just read the original Sudoku post by Norvig yesterday for the first time, and now today this shows up here!

layer8

It was bound to happen to someone.

IceDane

Wow, the Ron Jeffries articles are sort of embarrassing, and he doesn't even realize. This why dogma never works.

rjpower9000

I respect he's open about his work and struggles, and it's cool he's programming at 86, but it does seem like his approach makes it harder for him rather than easier.

For example with the bowling score calculator, it's great to start with some tests, but I think he then marched towards a specific OOP formulation which obscured rather than clarified the problem.

groby_b

It's making the same reasoning mistake that a lot of discussion of the Entscheidungsproblem makes - the problem talks about a generic algorithm to answer for all programs P if they can solve T, for all tasks T, the discussion assumes that you can't decide if P solves T for any P or T.

With that in mind, let's look at the crux of the argument: "If we can't decide if a program P solves a task T, then we certainly can't solve the even harder problem of finding a program P that solves a given task."

That's simply not true. We can decide if a program P solves a task T, for a specific program. Moreover, that means that for large classes of tasks, we actually can throw mud at the wall and see if it sticks - as long as it's decidable if the particular program P solves the particular task T.

And for any problems you can exhaustively test, hey, you really can rely entirely on TDD and hill-climbing the problem space. Hence, bowling scores being easier than Sudoku solvers.

As soon as you leave the (almost) exhaustively testable space, things become harder. And it's worth keeping in mind that TDD originates from a payroll system - something that's more amenable to exhaustive testing ("do all of our employees get the right amount, and did HR/finance stop yelling, and is our CFO not getting jailed for tax evasion") than a systematic approach. (Government plus corporate bureuacracy means that there are absolutely no deep structures to find. It's all about the special cases)

You can still do "TDD" at a higher level. You just need to accept that TDD is simply a primitive form of formally establishing your constraints, and that some constraints are hard to establish purely through testing. But that there exist many formalism to allow you to express higher level constraints.

This is at the core of the Haskell/Rust quip that "once the type checker/borrow checker is quiet, you can be confident the solution works".

Maybe constraint-driven design would've been a better name.

rjpower9000

I won't argue with the idea whether you can test a particular program produces a result (we aren't so interested in programs that run forever in any case). Your observation of the origins of TDD makes sense in this regard: there may be domains where the TDD approach is more useful, but it's not a generic panacea, and assuming it is can lead to sadness.

My point was more about arbitrary programs and tasks. In general, I don't think there's any particular process you can follow that, in effect, reduces programming or math to a checklist. I'm sure there are techniques that can help structure many problems but they're necessarily fairly general e.g. "think about the problem, take a nap, think about it some more".

What I was hoping to highlight was more the danger of trying to hill-climb without having the right set of skills, thus assuming you can solve a problem by following a particular pattern.

To give an example, I remember trying to derive the quadratic equation when I was young, and no amount of algebraic repositioning was going to help me unless I know how to complete the square! But once that type of trick is in your toolbox, you can begin to solve all sorts of other problems.

Even the bowling calculator can become hard if you don't know what your doing, not because TDD is bad or a wrong way to do things, but because you don't know how to reason about your program. Even bowling has 12^13 outcomes, with a bunch of silly special cases if you approach it the wrong way. Thus a truly naive approach might lead you down a weird direction.

HN

Reflections on Sudoku, or the Impossibility of Systematizing Thought

Reflections on Sudoku, or the Impossibility of Systematizing Thought