Nearly all binary searches and mergesorts are broken (2006)

200 comments

·January 11, 2025

Visit

jsnell

(2006)

Past discussions:

https://news.ycombinator.com/item?id=33492122

https://news.ycombinator.com/item?id=14906429

https://news.ycombinator.com/item?id=12147703

https://news.ycombinator.com/item?id=9857392

https://news.ycombinator.com/item?id=6799336

https://news.ycombinator.com/item?id=1130463

https://news.ycombinator.com/item?id=621557

motorest

> (2006)

In the very least,this should feature in the title. In fact, previous submissions from over a decade ago also feature year in the title.

Otherwise it conveys the idea that this is some major epiphany.

brandall10

Considering how often this happens, I'm surprised HN doesn't have a feature to flag and index a historically resubmitted article, whether externally by users or internally by the admin team.

Then it could have a bot create links to past submissions like the OP did and use the best arrived at title for resubmissions.

It does have the ability to catch multiple submissions of new articles but that probably has a time window of a day or so.

kurthr

One problem would be that you can't just check the URL, you'd have to check the content. Not only are there many URLs that could point to the same content, but content on a page can obviously change.

I suppose you could compare against wayback, and not sure I'd try to compare with an LLM or RAG.

xmcqdpt2

Also, I didn't know about this specific bug but I spotted it almost immediately while reading the code. This is not because I'm some kind of 10x programmer but because Integer.MAX_VALUE bugs while processing arrays are actually fairly common in java programs in 2025, I fixed one a few weeks back and it's something I look for in code reviews.

I guess it would have been surprising in 2006?

brabel

In 2006, if you tried to create an array with 2^30 elements you would just get an OutOfMemoryError almost anywhere.

null

[deleted]

spixy

Why are they common in java? In .net or node.js I dont think I have ever had one.

coldtea

[flagged]

VonNaturAustre

Thanks!

joshka

The problem with this article's name is that we're in an era where actually checking whether "nearly all sorts are broken" against all publicly available implementations of binary searches would be almost feasible, but that's not actually what the article is claiming.

> Moreover, to be really certain that a program is correct, you have to test it for all possible input values, but this is seldom feasible.

This is incorrect. Generally it's just a little excessive to try to solve the halting problem in a library's unit tests ;). You don't have to test a program for all possible inputs, only for all materially unique state transitions. In a binary search, each variable can be zero, one, some(*), max, overflowed. The combination of these is not infinite as the values of some that cause different behavior is much more reasonable when grouped into chunks of the same behavior.

PaulKeeble

It was certainly possible to run the algorithm on 16GB of array before the moment when it happened but did the original developer have that sort of space on their desktop at the time? Possibly not.

If a unit test only runs on a server and not on the laptops of the developers then its not going to be written, whereas ideally someone should write a test that is tagged to only run on the server but that is a lot of extra work if that isn't a common thing on a project. Even now I would be quite wary of producing a max size input test for an array of ints and especially objects, that is going to raise some questions due to being a slow test and a highly resource consuming one.

If I was worrying about the same in aerospace programming however then no question that test would get written and we would ensure it got run on the hardware that it was designed to run on. In typical business software its less common to run potentially very long running tests and simulations of the machines states everyone wants to go faster than that especially for the sake of a max input test.

miohtama

Also you can have checked math, like in Rust, and automatically just crash when you overflow a variable.

In this case, it's not a bug (cannot get incorrect result) but an unsupported input.

joshka

If you're worried about being able to make an array where a.length == int.max, (which is reasonable in an algorithm which does any math on the length), replace the array with another substitute which you can mock the size of (e.g. in Java that would be possible against an ArrayList). You can test the algorithm separately from the execution of the algorithm.

poincaredisk

Small nitpick: you don't need 16GiB, 2^31 bytes is "just" 2GiB. Doesn't contradict your point though.

penteract

Each element of the input array is at least 4 bytes, bringing it to 8GiB.

joshka

Swap in an ArrayList and mock size() and you need a few bytes...

omoikane

You might not need 16GB of memory. There are systems where int is only 16 bits, and overflowing 16 bits is not that difficult.

But maybe it was uncommon to have arrays larger than 64K in the 80s due to segmented memory?

PaulKeeble

The core of Java was being written in the late 1990s. I had a machine in 1995 that had 16MB of memory but 8MB was more typical for Pentium machines. By 2000 the AMD Athlon and Pentium 3 were the latest and greatest and Anandtech was testing with 128MB of memory [1].

Java defines an int as 32 signed, it doesn't do anything else it calls 16 bit ints shorts. So it definitely has to be 8GB for the array.

Sun was using Solaris machines rather than PCs and they were higher spec on things like memory but still I doubt they had 2GB let alone 8GB+ needed to run this test. That sort of memory didn't become routine for another decade where Sandy Bridge was being tested with 8GB in 2011 [2].

Also goes to show how much things have stagnated, a desktop computer with 16GB has been standard for a long time. The preceding decade we went from 128MB to 8GB as normal and the next 15 years to today normal is 16-32GB which is no where near the same place of progress of memory density.

[1] https://www.anandtech.com/show/566/6 [2] https://www.anandtech.com/show/4083/the-sandy-bridge-review-...

xmcqdpt2

The correct way to test that code is to write a version that doesn't take an actual array but an index -> int function, then you wouldn't need to instantiate the array at all.

null

[deleted]

rileymat2

> In a binary search, each variable can be zero, one, some(*), max, overflowed. The combination of these is not infinite as the values of some that cause different behavior is much more reasonable when grouped into chunks of the same behavior.

You are presuming that the algorithm is correct in the first place. Contrived example.

   binary_search(array, key) {
     if (key == 453) explode_world()
     //Do correct binary search.
   }

So you also need to prove explode_world is not called or something similar but less contrived is not in there.

joshka

No. I'm presuming that you're able to look at the branches in code and write white box tests. Here the relevant two branches are 453 and anything other than 453. For addition of unsigned numbers, the relevant branches are 0 + 0, 0 + 1, 0 + max, 1 + 0 (usually can be skipped), 1 + 1, 1 + max (which overflows), etc.

The number of relevant material states is bounded and knowable. It's neither unbounded (1,2,3,4,5,6 ... ) or unknowable as the algorithm is available.

adrianN

That is easily caught by a model checker that tries the equivalence classes. Or just normal coverage tooling.

dotancohen

  > each variable can be zero, one, some(*), max, overflowed

These are the value ranges that I test in my unit tests, plus at and before some powers of 2 (e.g. 65535 and 65536), and -1. That is because -1 is uniquely used in some systems to indicate error, and thus trips some bugs.

MaxikCZ

Are you quantizing information?

manquer

You mean property testing ? QuickCheck would be the go to library ( the original is in Haskell, however most languages have one ).

LiamPowell

This bug, and many others, can be detected with a trivial amount of formal verification. I really think formal verification should see much wider use for things that see as much use as standard libraries, even if it's just for the trivial things like overflow and out-of-bounds access.

In the below code we can see a formal verification tool (GNATProve) detect both the original error and the out-of-bounds access that it causes. Doing the arithmetic using a larger type clears both reported errors without the need for any additional annotations for GNATProve.

    function Search (A : A_Type; Target : Integer) return Integer is
       Left : Integer := A'First;
       Right : Integer := A'Last;
    begin
       while Left <= Right loop
          declare
             Mid : Integer := (Left + Right) / 2;
          begin
             if A (Mid) = Target then
                return Mid;
             elsif A (Mid) < Target then
                Left := Mid + 1;
             elsif A (Mid) > Target then
                Right := Mid - 1;
             end if;
          end;
       end loop;
    end Search;

GNATProve output:

    Phase 1 of 2: generation of Global contracts ...
    Phase 2 of 2: flow analysis and proof ...

    wrapper.adb:12:36: medium: overflow check might fail, cannot prove lower bound for Left + Right
       12 |            Mid : Integer := (Left + Right) / 2;
          |                             ~~~~~~^~~~~~~~
      reason for check: result of addition must fit in a 32-bits machine integer
    wrapper.adb:12:45: info: division check proved
    
    wrapper.adb:14:19: medium: array index check might fail
       14 |            if A (Mid) = Target then
          |                  ^~~
      reason for check: value must be a valid index into the array

jheriko

can also be spotted with experience.

these kinds of problems are present in very many standard treatments of algorithms and don't survive their first contact with real world use in some cases.

"needs a carry bit or a wider type" is common for arithmetic operations that actually use the range.

bhouston

It makes the case that we need Math.mean or Math.avg function that we can use in these cases rather than than reinventing it.

Basically we should favor using built in functions for as much as possible because those should be reliable and tested more than ad hoc code we write. And compilers should optimize those built in functions well so there is no extra cost in using them.

sltkr

C++ added std::midpoint() to the standard library: https://en.cppreference.com/w/cpp/numeric/midpoint

Another fun case, besides integer overflow, is negative values: in Java and C/C++ (i + j)/2 will round towards j, but i + (j - i)/2 will round towards i instead. Sometimes the difference matters.

ape4

I was thinking the same. And the code would be clearer too.

heinrichhartman

    int mid =(low + high) / 2;

> Specifically, it fails if the sum of low and high is greater than the maximum positive int value (2^31 - 1).

I would really challenge calling this kind of effects "bug" or "breakage".

It's like calling Newtons law of gravity broken, because it's not accurate at predicting how galaxies move.

Things are engieered and tested for a certain scale.

Knowing which tools to use at which sacle is part of the craft of engineering.

wat10000

Sometimes they’re engineered and tested for a certain scale.

More often they’re engineered and tested for an arbitrary scale. The limits aren’t considered, behavior at the edges isn’t accounted for, and it’s assumed it will be good enough for real world inputs.

The use of `int` tends to be a dead giveaway. There are some cases where it’s clearly correct: where the spec says so (like argv), where you’re starting from a smaller type and it’s impossible for the calculations to overflow in an int (like adding two uint8), that sort of thing. And there are cases where it’s subtly correct, because you know the range of the value is sufficiently limited, either by mechanics or by spec.

But most of the time, int gets chosen because it’s the apparent default and it’s easy to type. No analysis has been done to see if it’s correct or if you want to declare your code to only support inputs in a certain range.

It’s really clear if you’ve written a binary search (or anything else that works on general arrays) in C and you use int as the index type. There’s pretty much no scenario where that makes sense. In theory you could analyze the entire program and prove that over-large arrays are never passed in, and keep doing it to make sure it stays that way, but that’s not realistic. If the programmer actually took one second to think about the appropriate data types, they’d use size_t rather than int.

You can still have this bug with size_t, of course. But it won’t be “this falls apart with arrays over 1G elements on 64-bit systems that can easily handle them.” If you declare that you wrote the obvious midpoint calculation with size_t because you didn’t intend to support byte arrays larger than half the address space, it’s at least plausible.

tehjoker

i write c++, but i had to teach myself and always wondered why others use imprecise types. portability is one possibility, but then you can't know if your datastructure will break for a given input

wat10000

History and tradition at this point. Bit-sized integers and the other “meaningful” integer types like size_t weren’t added to the languages themselves until C99 and C++11. A lot of us learned those languages before that, and lots of code still exists from that time, or at least code bases that have evolved from that time.

I think it actually comes from the opposite of portability. Access to different kinds of systems wasn’t common then. If you were learning and working on a system where int is 32 bits and pointers are 32 bits, and other possibilities are just vague mentions in whatever books you’re learning from, it’s very easy to get into the habit of thinking that int is the right type for a 32-bit quantity and for something that can hold a pointer.

prewett

I'm not sure what you mean by "imprecise types", but if you mean something like using an `int` for an array index instead of `size_t` or something, I can tell you why I do it. Using `int` lets you use -1 as an easy invalid index, and iterating backwards is a straightforward modification of the normal loop: `for (int i = max; i >= 0; --i)`. That loop fails if using `size_t`, since it is never negative. Actually `size_t` may not even be correct for STL containers, it might be `std::vector::size_type` or something. Also, I don't think I've encountered an array with more than 2 billion items. And some things, like movie data, are usually iterated over using pointers. As you say `int` is easy to type.

Also, for something like half my programming life, a 2+GB array was basically unobtainable.

alanfranz

> certain scale

Make it explicit. If the array is too large, throw an IllegalArgumentException. Document the limit. Then I agree with you.

Otherwise, if an allowed input crashes the program at runtime with a random exception, I respectfully disagree.

feoren

IllegalArgumentException is OK in your book, but OverflowException is not? It's very rare that I actually care which exception type is thrown, but it's a little more polite to throw more clear and reliable exception types. But saying "this could be a little more polite and therefore YOUR CODE HAS A BUG" would make me never want to work with you again.

alanfranz

Explict OverflowException could be ok but it may not tell me the root cause. A random index error is not ok at any time.

brabel

Then you should absolutely stay away from C :)

jldugger

Words to live by!

alanfranz

I do.

But the example was Java.

ajuc

The problem is that it's not that much harder to make it work for all the valid inputs.

Not doing that is not good enough.

Another example is summing lots of floats naively instead of using Kahan's algorithm.

It's like we had a theory of gravity that doesn't work on Mars because we have unnecessary division by (m-Mars' mass) in our laws :) It wouldn't be good physics.

feoren

It's not much harder in this toy example. In real examples what this looks like is a ton of bikeshedding and arguing about minutiae instead of improving the system in ways that are orders of magnitude more valuable to everyone. The truth is it's far easier to waste time on this mostly meaningless crap, exchanging one exception for another, than it is to think deeply about the problem you're actually trying to solve. It's lazy.

ajuc

In real world you're not implementing binary search but using one implemented already.

Hopefully implemented by someone who cared enough to avoid bugs or at least document the range of the arguments.

xigoi

Is it worth making the algorithm slower just to have it work on extreme edge cases?

jamietheriveter

> Is it worth making the algorithm slower just to have it work on extreme edge cases?

Yes! If an API has no documented (and preferably enforced) limits on its arguments, the caller has every right to assume the full range of the argument types is supported.

Else it is very much a bug.

“Be reasonable, dude” is ok when you order 1,000 beers. It’s not an appropriate response when calling a binary search API.

javcasas

Is it worth to make the algorithm faster at the cost of throwing surprise OutOfBounds exceptions in some extreme edge cases?

Maybe, but only if you - and only you,and not an attacker can control the case you are in.

ajuc

Well it's either that or document the domain.

croemer

Nice example with the 1/(m-m_mars)!

tempodox

+1 for mentioning Kahan.

javcasas

> Certain scale

Or just put the algorithm in a 16-bit microcontroller, put some table that needs to be looked up (think precomputed sine table), put that table near the end of the memory range, and just make the mistake to call binary search specifying the start and end memory positions of the table.

perching_aix

These implementations are definitely broken when the specification goes like "you just pass in your array here and it will perform binary search on it for your value." Yes, you could constrain this spec, but come on...

It's such a blatant example for a bug, that I struggle to believe how can anyone even remotely conceive and pivot to the idea that "nuh-uh, this is a bad spec not a bad implementation!".

secondcoming

I would disagree. The inputs to these functions are user controlled and so can be forced to break, whereas humans cannot change how gravity works.

null

[deleted]

dzaima

But in what realistic scenario would a user be able to put in ≥2^31 while not being able to put in 2^32-1 (which probably breaks many more things from innocent increments or similar) or 2^100?

Retric

And further this all assumes they used int vs. long. It can be “wrong” in that it only works for arrays of under 2^63 elements without that ever being a possibility.

Production code is often filled with edge case bugs that simply never come up. Works for 100x the expected use case is generally good enough if you’re approaching the limits where 2^31 is a potential issue then you are also approaching the case where 2^32 definitely will be.

sillysaurusx

Hacking, of course. Overflows are one of the primary ways that hackers gain control of systems.

wat10000

2^32-1 is almost always a possibility when 2^32 is, but there are many cases where those are possible but 2^100 is not. Basically anything where the value is a count of something rather than raw input fits the bill. How many characters, lines, or files do you support? 2^32 is a totally feasible number in many contexts. 2^100 is physically impossible, there isn’t enough matter.

djoldman

> The bug is in this line:

> In Programming Pearls Bentley says that the analogous line "sets m to the average of l and u, truncated down to the nearest integer." On the face of it, this assertion might appear correct, but it fails for large values of the int variables low and high. Specifically, it fails if the sum of low and high is greater than the maximum positive int value (231 - 1). The sum overflows to a negative value, and the value stays negative when divided by two. In C this causes an array index out of bounds with unpredictable results. In Java, it throws ArrayIndexOutOfBoundsException.

At some point we have to draw an arbitrary line. Even an "arbitrary precision" calculation is bounded by system memory.

"bug" is not well-defined, or perhaps "bug" is more of a continuous label as opposed to discrete.

poincaredisk

Why do we need to draw that line somewhere? Fixed solution works for a full range of Java int.

borntyping

Nearly all binary searches and mergesorts are broken in languages with bounded integers.

dietr1ch

Doing any math with bounded integers considered harmful.

At some point during my undergrad I realized this and tried to be really careful when implementing algorithms, but it's stupidly hard to do in a tidy way, at least in old C. It's not practical and people just rather close their eyes and live in blissful avoidance.

perching_aix

And on machines with finite memory, right? Which would be every actual computer ever built?

socksy

Well I would posit that it would be hard to get to this code in a language with unbounded integers where (low n + high n) causes an OOM error, because in order to run this code, you first need an array n units wide.

You could argue that the array itself could take up most of the space leaving no room for the indices, but that's hardly a fault with the algorithm, as now you've got a computer that basically can't do anything due to overloading. Whereas overflowing a 32 bit integer is a much more likely occurrence that arguably the algorithm should account for.

chowells

Why does everyone talk about binary search in terms of arrays? Binary search works with any monotonic function. Looking up a value in a sorted array is just a special case of a monotonic function.

xigoi

So previously it worked for arrays up to length 2³⁰ and now it works for arrays up to length 2³¹. Is it really that much of a difference? Wouldn’t it be better to change the indexing type to a 64-bit integer?

ddxxdd

The difference is that we know for a fact that the proper implementation works for integers up to 2^31, whereas the improper implementation deceived us into thinking that the code would work in situations where the code actually doesn't work.

I find it valuable to understand when my code will crash.

foldr

Article was written in 2006 when 32-bit architectures were dominant.

I suppose the issue is moot now on 64-bit architectures, as the difference between 2^62 and 2^63 isn't relevant. (2^62 bytes is 4611686 terrabytes.)

im3w1l

Spelling it out like that sure gives some perspective - It's a frighteningly low number! They sell 2tb microsd cards nowadays. I bet you could fit 2^64 bytes in a shoebox. So I think uint64 might eventually be insufficient as a size type.

Edit: Well not quite, more like a small closet.

tliltocatl

Almost makes you think RISC-V was right with 128-bit extension. On the other hand, exabyte-scale memories might one day be possible, but would it still be useful to maintain single-address-space illusion for these?

coldtea

>Spelling it out like that sure gives some perspective - It's a frighteningly low number!

Yeah, it's very common for computers to have byte addressable 4 exabytes of storage...

eesmith

The use of:

   int high = a.length - 1;

tells us that a.length-1 is supposed to fit into an int, so there is no need to handle 2³¹ or larger.

rob_c

Yep. No story here, feel free to move on...

MattPalmer1086

Not really. Arrays are limited to an int in size. You would be using more memory for the calculation and have to cast the value down to a 32 bit value to use as an array index.

Or you could just write the code so it isn't vulnerable to integer overflow.

touisteur

Which is sad (array size limited to an int) and has always annoyed me, coming back from Ada where the index of an array is just another discrete type (including boolean, enum type, and ranged integers).

rob_c

Ideally the index should support int64 with int being a synonym on a 64bit platform by default.

If not yes frankly you're into such unexpected behaviour territory that you should check your whole codebase rather than rely on stuff working just because it compiled.

And we all know how everyone loves writing and understanding integration tests... (I personally do but most problems in the industry wouldn't be a thing if more people stopped to write them)

bobmcnamara

size_t, you mean?

Ameo

A cynical takeaway from this is that it's hard to write good code and it doesn't really matter if you do or not.

Most code at every company I've worked at and project I've built is littered with technical incorrectness and buggy half-measure implementations. It's human nature or time pressure or something like that, but the application continues providing business value despite that.

SoftTalker

Because it's sort of like premature optimization. If your business case will never be dealing with billion-element arrays, it's a waste of time to make sure your code can handle them.

3ple_alpha

No they're not. If you're using an array with length over a billion in Java, your code stinks already before you start using binary search.

coldtea

That's not only wrong in itself, but totally orthogonal.

A binary search implementation should still work, regardless of the array length, or have the limitation documented.

And of course an array "with length over a billion" can be totally valid, depending on the use case, your tradeoffs, available memory, etc. It could even be the optimal data structure for some use cases.

poincaredisk

I'm not a Java programmer, but how would you load of a 1GB file into memory? I assume read returns some kind of an array.

Also big arrays being (supposedly) a coffeee smell doesn't mean that code handling them improperly is not buggy.

aardvark179

If you really needed it in memory you’d use one of the file APIs that will map it and present a direct byte buffer view over that memory.

Those APIs use long as their offset unlike the 32 ints used by arrays, and would avoid having to copy the data into some other object.

There has been some discussion over the years about how arrays could be changed in the JVM to support longer lengths, but doing so without breaking existing code and while providing truly useful functionality without providing obvious footguns isn’t as easy as you might think.

angus_gh

Java's arrays use a signed 32-bit int as their length, so the longest they can be is about 2 billion elements.

If your code has arrays over a billion elements, then it will fall over the moment someone inputs slightly larger data

danvonk

Relational databases often require searching and sorting gigabytes of data to answer queries (sometimes larger than RAM if e.g. k-way merge sort is used) so it doesn't seem that far-fetched, especially given that there are database systems written in Java.

tehjoker

not doing much scientific programming eh?

null

[deleted]

sonu27

Title needs updating with the year 2006

usr1106

I often think AI is mostly crap, wasting a lot of energy for very questionable benefits. But could/should this repetitive task of reminding submitters to follow the submission guidelines and add the year to submissions of old articles be automated?

pdimitar

I would agree, though why would you need AI for that is an open question.

sonu27

A simple crawler would have been able to detect it’s from 2006. Perhaps a reminder should be added if the year is not recent

usr1106

What algorithm would you suggest to find the year in an arbitrary submission? Of course AI is not a very clearly defined term, more difficult problems certainly exist. I was just thinking of the case the submission contains several dates or none at all and still several hints a human would take into consideration get checked.

Of course some minimal implementation without AI techniques could already handle many cases. My AI suggestion was not death-serious ;)

coldtea

I think adding the year is mostly crap. What exactly information would it give, except perhaps the false impression that this article is "antiquated information", when it pretty much holds true, and describes a perrenial issue?

gmfawcett

It gives a cue about how many times I've probably seen the article before. Quite useful, IMO. I read this particular article when it came out in 2006... it's convenient to know we're not discussing a novel finding on the same topic.

nsxwolf

Write this in a Leetcode interview and I suspect the average interviewer will fail you and not believe your reasoning.