JesseSort: A novel sorting algorithm that is faster than Python's default sort.

63 comments

·February 13, 2025

mlochbaum

So the idea is that you'll eventually arrange n elements into ideally about √n sorted lists, organized so that the endpoints of these lists are also ordered (the "rainbow" shape). A new element goes on the end of one of those lists or a new list; which one is generally decided by binary search, but the index is saved to speed up the next insertion if it's the same. So a natural run might benefit by having adjacenct values inserted to the same list.

Which cases benefit and which don't seems potentially quite erratic. If you have a series of runs that shrink over the course of the array, with increasing start value and decreasing start value, each will become a new sorted list and you end up with O(n) insertion plus a perfect merge. If the runs get wider instead, they'll cross over existing endpoints and end up spattered around before merging.

It feels like the paper keeps trying to spin constant-factor improvements as somehow being asymptotically relevant despite acknowledging that mathematically they aren't, for example the fact that log(√n) is only log(n)/2 "would not reflect the sublinear nature of base array length k" and runs hitting the best case "approximately 50% of the time... reinforces JesseSort's average runtime complexity as somewhere between O(n) and O(n log_2 n)". Powersort provably takes O(n + n log r) worst-case time where r is the number of runs; I expect this algorithm is O(n log n) worst- and average-case.

However, on random arrays the given benchmarks do improve relative to Python default as the length increases, with faster performance starting a little under a million elements. There isn't a lot of investigation into why this is. I'd be interested to know if the number of comparisons is reduced in particular. Perhaps this algorithm is more cache-friendly with a random input, but this wouldn't be reliable as many inputs can cause the number of sorted lists to scale with O(n) so that the binary searches need to work on a large array.

mlochbaum

The analysis would make a lot more sense if it dropped every big O and computed an expected number of comparisons. Of course, the real issue is that the whole thing relies on an empirically-determined length of √8√n for the number of lists in the rainbow, when the theoretical justification isn't there and clearly the worst case is n/2. I'm not seeing any awareness that the expected number of comparisons has to be at least log_2(n!), and am starting to see some worrying signs that the author thinks averages like n log log n are possible. I'd initially assumed that "somewhere between O(n) and O(n log_2 n)" statement meant in the presence of asymptotically large natural runs, but there's a claim that the runs "need not be of any minimum size" and another that the algorithm can "capitalize on natural runs" that seems to be referring to the benchmarks on "purely random values".

mlochbaum

Ah, the README prior to commit b98f724 does claim O(n log(1.5 ln(n))) asymptotic runtime. This is based on a flawed analysis assuming rainbow endpoints will be evenly distributed, which the paper presents before describing the problem that each inserted element pushes one of those endpoints away from the middle. Bit of a perpetual motion machine search: initial results showed the promise of this approach, some details got in the way but surely there's some trick to work around them...

Still there's the benchmark. I remembered timsort.txt shows comparison counts relative to the log_2(n!) bound and it's only like 1% worse, so the >30% improvement over Timsort at the high end has to come from some other factor. I'm thinking it is actually cache effects—not because the algorithm has inherently better cache properties, as it's just a merge sort at large scales, but because it copies the array to ints internally rather than using a generic comparison. That would be a poor result, as good comparison sorts compiled for 4-byte data are many times faster than Timsort.

tim-peters

I want to thank you for your analysis! I'm the "timsort" guy, and I'm asked to look at all sorts of things. The devil is in the details, and I've given up bothering to look unless a paper spells out sufficient details up front. Else it's a pit I'd rather not get sucked into.

As general things, timsort was aimed at exploiting all kinds of pre-existing order, not random lists. The only goal for the latter was to be within reach of CPython's previous highly tuned samplesort implementation (like quicksort on steroids, with much less data movement than mergesorts, but more comparisons).

As you say, for randomly ordered input it gets remarkably close to the information-theoretic lower bound on # of compares. So that's not it.

"Cache effects", maybe, but that's a pit to dig into. I'll note that while the newer "powersort" merge strategy is elegant and provably near-optimal by some relevant measures, in practice it doesn't really appear to run any faster (although cases can be _contrived_ that make it - or the older method! - run faster).

Comparison specialization (to ints) could very well account for it. CPython's general comparison machinery is _very_ expensive, even for what turn out to be native machine ints (which CPython cannot know at compile-time: everything is deduced at runtime).

CPython has since grown new machinery to do a pre-pass over the list, and do a form of cheaper comparison specialization for what turn out to be homogeneous lists of suitable types (including "small enough" ints). That alone gives enough speedup to make some of the original choices sub-optimal. For example, binary insertion sort for short runs is no longer best then. That was aimed at minimizing worst-case # of compares, but as comparisons get cheaper that has less value.

There are also surprises in everything. For example, the worst case for binary insertion sort on most machines was _not_ reverse-ordered input, despite that it requires the most data movement. Instead the worst case was randomly ordered data. Why? Branch prediction. In randomly ordered data, each branch is unpredictable. In reverse-ordered data, "move to the left" is always the test outcome.

Another generality is that Python's sort cares more about speed for shorter lists than huge ones. "Big data" problems are more likely to use, e.g., extensions geared to "big data problems", like numpy for giant arrays of floats. In those contexts more suitable _algorithms_ exist, like radix sorts, or multi-key quicksorts for lists of long strings with long shared prefixes.

Python's niche is more in, e.g., web services, where a great many sorts of shorter lists are common.

Bottom line: there is no one "best" sorting algorithm. I keep an eye out for newer developments, but haven't seen anything better yet for what Python is aiming at. I was, e.g., very impressed by pdqsort - but it's not a stable sort, and that alone makes it a non-starter for general Python use.

zahlman

>This is based on a flawed analysis assuming rainbow endpoints will be evenly distributed

I guess you know this, but we don't need to do any complicated analysis to disprove such a big-O for a comparison-based sort. It's a well known information-theory result: each comparison can remove at most O(1) entropy, while the initial state has O(n lg n) entropy because that's how log(N!) grows (https://en.wikipedia.org/wiki/Stirling%27s_approximation).

drpixie

If you're going to make a big claim about sort speed, tell me how speed is better/worse for various data. How do the algorithms compare when the data is already ordered, when it's almost (but not quite) already ordered, when it's largely ordered, when it's completely random, and it's in the opposite order. This stuff, as well as the size of the dataset, is what we need to know in practice.

tialaramex

Rather than, or at least in addition to, raw measured speed on a specific piece of hardware, which is often affected in hard to understand ways by niche optimisation choices and nuances of specific hardware, I actually really like the choice to report how many comparisons your algorithm needed.

For example Rust's current unstable sort takes ~24 comparisons on average for each of 10^7 truly random elements to sort them all, but if instead all those elements are chosen (still at random) from only 20 possibilities, it only needs a bit more than five comparisons regardless of whether there are 10^3 elements or 10^7.

Unlike "On this Intel i6-9402J Multi Wazoo (Bonk Nugget Edition) here are my numbers" which is not very useful unless you also have the i6-9402J in that specific edition, these comparison counts get to a more fundamental property of the algorithm that transcends micro architecture quirks which will not matter next year.

Vecr

"My boss wants to buy systems with the Intel i10-101010F Medium Core Platinum (with rowhammer & Sonic & Knuckles), can you buy this $20,000 box and test your program so I can write him a report?"

sambeau

https://www.researchgate.net/publication/388955884_JesseSort

null

[deleted]

nimih

What's your point? The paper you're linking does not include the analysis the post you're responding to is asking for.

gymbeaux

It does give some insight into what you seek, at least. For example, “We ﬁnd that for smallern≲262144, JesseSort is slower than Python’s default sort.”

I’d like to see a much larger n but the charts in the research paper aren’t really selling JesseSort. I think as more and more “sorts” come out, they all get more niche. JesseSort might be good for a particular dataset size and ordering/randomness but from what I see, we shouldn’t be replacing the default Python sorting algorithm.

tim-peters

I want to elaborate on timing cautions: a sort that specializes to 4-byte machine ints is doing something that can't be expressed directly in CPython. Its lists are heterogeneous, types aren't known until runtime, and there is no native (to Python) concept of "4-byte int". All CPython ints are stored in "bigint" (unbounded) format, and even zero requires 32 bytes of storage in the format (overheads for a type pointer, refcount, 16-byte heap alignment padding). That format isn't 2's complement either (although it creates the semantic _illusion_ of being such): it's sign+magnitude.

The general object comparison machinery is very expensive at runtime, essentially repeated from scratch on every compare. To mitigate this, CPython does a prepass over the list (O(n)) to see whether the list is in fact homogenous, and, if so, of a type that can use a more specialized comparison function. Not free, but pays off hugely if you are in fact sorting a list of "small enough" machine ints. But it doesn't change the storage format: on every compare, cycles are still consumed to transform CPython's sign+magnitude bigint format to native 2's-complement for native machine code to compare directly.

I expect this accounts for most of the observed "speedup" over timsort. Despite semi-heroic efforts to cut the cost, comparisons of native ints in CPython will always be significantly more expensive than code specialized to compare native ints directly. At least until (if ever) CPython learns how to "unbox" ints. The PyPy implementation of Python already does, and sorting lists of native machine ints runs about twice as fast under PyPy. It's not the sorting algorithm that matters to this, it's the relative cost of machine-int compares, and almost certainly too the cache pressure of using many more bytes to store small ints than necessary in "boxed" format.

As mentioned elsewhere, JesseSort reminds me most of the old and simpler "patience sorting", with which it has a lot in common. Both keep a collection of sorted sublists (to be merged at the end), both use binary search to pick a collection member to which to add 'the next" input, both excel at "low diversity" inputs, both (on average) end up with O(sqrt(n)) sublists on randomly ordered inputs, and neither is "naturally" stable.

As I noted in listsort.txt at the time, while timsort gets a lot of value out of repeated elements ("low diversity"), it's just not the right approach to exploit that aa much as possible. JesseSort and pdqsort (and, by extension, glidesort) will always do better on that. OTOH, timsoet gets value out of many kinds of partial ordering, which "reveal themselves" dynamically via one run in a merge "winning" consistently.

vanderZwan

Interesting approach, although I'm wondering why they didn't just use a double-ended queue instead of a linked list.

Also, I get the urge to name the algorithm after yourself, since Timsort is named after its creator as well, but if you call you data structure "rainbow" then my first association would be the rainbow flag, since flag-sorts also have a tradition in sorting algorithm names.

[0] https://en.m.wikipedia.org/wiki/Dutch_national_flag_problem

[1] https://en.m.wikipedia.org/wiki/American_flag_sort

ipsum2

First thing I think about with "rainbow" and "data structures" is rainbow tables.

blibble

a double-ended queue is often implemented using a doubly linked list (like the one in use here)

if you back it by two arrays then you can't do O(1) inserts in the middle (like is going on here)

tialaramex

It's true that awful Deque implementations are often built from a linked list. If you're taught about big-O notation and haven't noticed how modern computers work you might even do that today.

I would expect that they're thinking of something more like Rust's VecDeque, so that's a conventional "growable array" (Rust's Vec, Java ArrayList, C++ std::vector) used internally with some head + tail tracking to form a circular buffer so as to deliver the Deque API (fast push to head or tail, fast pop from head or tail).

blibble

you can make mechanically sympathetic linked lists with low constant factors as long as you avoid pointers

conradludgate

And what about pdqsort, glidesort, driftsort, ipnsort? New research is always cool, but there's plenty of other research in this space to compare this to

agnishom

I am a bit concerned that the primary channel of distribution of the paper is ResearchGate. I am somehow biased against taking it seriously because of this.

behnamoh

I have mixed feelings about the naming. Isn't it kinda off-putting that he named it after himself? usually it's other people who name your algorithm after you. At least that's how it's like in science and engineering (Fourier didn't call it "Fourier transform", Laplace didn't call it "Laplace transform", Kalman didn't name it "Kalman filters", etc.)

bluepnume

He should have at least called it eejss-sort

arduanika

This guy sorts

behnamoh

This guy names

hnisoss

its only ok if your name is Tim?

tim-peters

Perhaps ;-) I'm the "Tim" in "timsort". The name was an inside joke. I'm not a self-promoter, and never have been. As the so-called "Zen of Python" author, I thought it would be funny to pick a name I'd never pick ;-)

CPython had a unique (in my experience) combination of cheap data movement (only pointer swaps) and very expensive comparisons. That's what I was aiming at. I never publicized it, never wrote about it outside the CPython repo, and never thought I'd hear about it again.

Of course I'm pleased it found wider use, but that took my wholly by surprise. If I had it to do over again, I would probably have named it, say, gallopsort.

If this new sort catches on, Jesse should rename it before it's too late ;-)

gyudin

Like Dijkstra's algorithm? Knuth-Morris-Pratt algorithm? Huffman coding?

quuxplusone

Not to mention Shellsort.

https://en.wikipedia.org/wiki/Donald_Shell

charlieyu1

Did any of these guys put their name on the algorithm by themselves?

vanderZwan

The author seems young, I think we should be a little forgiving if they saw all these algorithms and concluded "ok so if you come up with something new you get to name it after yourself"

behnamoh

those names were given to those algorithms by others, not the creators.

gymbeaux

What did they call their algorithms before they passed away?

null

[deleted]

jessekv

I like the name.

vivzkestrel

I know it is not possible in our Universe but on a lighter note, in some parallel universe out there, someone has made a sorting algorithm that runs in O(1). I wonder though if a quantum computer can simultaneously sort all the elements in O(1)

procaryote

If you can pick parallel universes, just pick the one where the data happens to be sorted

warkdarrior

"quantum computers are no better than classical ones [for sorting]"

https://en.m.wikipedia.org/wiki/Quantum_sort

henry2023

A modern Borges could write a really good short story about that universe.

pinoy420

Why is this implemented in python? Is it compiled some how?

00ajcr

There's a Cython implementation (the .pyx file). The Cython code is compiled to C code (then that C code is compiled).

xiaodai

Python's default is gallop sort however radixsort is much faster and performs in O(n).

selcuka

> radixsort is much faster and performs in O(n).

Radix sort time complexity is not O(n); it's O(n*k) where k is the size of the largest element. Besides it has an additional O(n+k) space complexity.

ijustlovemath

I thought it was Timsort?

pstoll

It was and always will be timsort-ly yours. iykyk.

perlgeek

radixsort isn't a comparison-based sort algorithm, so you're comparing apples to chickens.

prirun

"We ﬁnd that for smaller n≲ 262144, JesseSort is slower than Python’s default sort."

repsilat

I wonder about the global statistics of sorted data... Is the most common number of elements to sort zero? Certainly less than ten?

What about the median? Two elements to sort? One? Zero again?

lblume

The question is also for which list lengths the performance matters most. When sorting a few strings (<20), whether the algorithm uses 5 or 7 comparisons would usually not matter too much. So to find the optimal algorithm for a given situation, we would have to compute a weighted average by list length importance on the performances of the algorithm per list length.

3eb7988a1663

Don't high performance systems have heuristics to decide what specific algorithms to use at runtime? It is not unimaginable to think that there could be a function dedicated to small vs large collections.

Assuming the results hold, someone has to decide if the additional complexity is worth the performance. For something like BLAS, go nuts. For Python standard library, maybe not.

tcper

Amazing, there is still new sort algorithm today!

HN

JesseSort: A novel sorting algorithm that is faster than Python's default sort.

JesseSort: A novel sorting algorithm that is faster than Python's default sort.