A leap year check in three instructions

174 comments

·May 15, 2025

Visit

_heimdall

> return ((y * 1073750999) & 3221352463) <= 126976;

> How does this work? The answer is surprisingly complex.

I don't think anyone is surprised in the complexity of any explanation for that algorithm :D

divbzero

> Note that modern compilers like gcc or clang will produce something like is_leap_year2 from is_leap_year1, so there is not much point in doing this in C source, but it might be useful in other programming languages.

The optimizations that compilers can achieve kind of amaze me.

Indeed, the latest version of cal from util-linux keeps it simple in the C source:

  return ( !(year % 4) && (year % 100) ) || !(year % 400);

https://github.com/util-linux/util-linux/blob/v2.41/misc-uti...

cookiengineer

But this is wrong and can only represent dates after the specific year when they switched from the Julian to Gregorian calendar!

For more on this, I recommend reading and implementing a function that calculates the day of the week [1]. Then you can join me in the special insanity hell of people that were trying to deal with human calendars.

And then you should implement a test case for the dates between Thursday 4 October 1582 and Friday 15 October 1582 :)

[1] https://en.m.wikipedia.org/wiki/Determination_of_the_day_of_...

LegionMammal978

> the specific year

The problem is, which "specific" year? The English were using "old-style" dates long after 1582. Better not to try to solve this intractable problem in software, but instead annotate every old date you receive with its correct calendar, which may even be a proleptic Gregorian calendar in some fields of study.

(How do you determine the correct calendar? Through careful inspection of context! Alas, people writing the dates rarely indicated this, and later readers tend to get the calendars hopelessly mangled up. Not to mention the changes in the start of the year. At least the day of week can act as an indicator, when available.)

null

[deleted]

voxic11

The full code is

   static int leap_year(const struct cal_control *ctl, int32_t year)
   {
    if (year <= ctl->reform_year)
     return !(year % 4);

    return ( !(year % 4) && (year % 100) ) || !(year % 400);
   }

Where reform_year is the year the Gregorian calendar was adopted in the specific context specified (defaults to 1752 which is the year it was adopted by GB and therefore also the US).

So it does account for Julian dates.

divbzero

cal doesn’t offer an option to use the 1582 reform date, but looks like it does handle the 1752 adoption in Great Britain correctly:

  $ cal 9 1752
     September 1752
  Su Mo Tu We Th Fr Sa
         1  2 14 15 16
  17 18 19 20 21 22 23
  24 25 26 27 28 29 30

null

[deleted]

sigmoid10

I like how the linux one is also easier to understand because it doesn't perform three sequential checks which actually invert the last two conditions plus a default return. That's the kind of stuff that can make you crazy if you ever have to debug it.

darkwater

I wondered 3 minutes "this is not right" til I realized that

  if ((y % 25) != 0) return true;

was actually checking for different from 0 (which in hindsight makes also sense because the century years by default are not leap unless they divide by 400)

qingcharles

I love these incomprehensible magic number optimizations. Every time I see one I wonder how many optimizations like this we missed back in the old days when we were writing all our inner loops in assembly?

Does anyone have a collection of these things?

ryao

Here is a short list:

https://graphics.stanford.edu/~seander/bithacks.html

It is not on the list, but #define CMP(X, Y) (((X) > (Y)) - ((X) < (Y))) is an efficient way to do generic comparisons for things that want UNIX-style comparators. If you compare the output against 0 to check for some form of greater than, less than or equality, the compiler should automatically simplify it. For example, CMP(X, Y) > 0 is simplified to (X > Y) by a compiler.

The signum(x) function that is equivalent to CMP(X, 0) can be done in 3 or 4 instructions depending on your architecture without any comparison operations:

https://www.cs.cornell.edu/courses/cs6120/2022sp/blog/supero...

It is such a famous example, that compilers probably optimize CMP(X, 0) to that, but I have not checked. Coincidentally, the expansion of CMP(X, 0) is on the bit hacks list.

There are a few more superoptimized mathematical operations listed here:

https://www2.cs.arizona.edu/~collberg/Teaching/553/2011/Reso...

Note that the assembly code appears to be for the Motorola 68000 processor and it makes use of flags that are set in edge cases to work.

Finally, there is a list of helpful macros for bit operations that originated in OpenSolaris (as far as I know) here:

https://github.com/freebsd/freebsd-src/blob/master/sys/cddl/...

There used to be an Open Solaris blog post on them, but Oracle has taken it down.

Enjoy!

JdeBP

For an entire book on this stuff, see Henry S. Warren Jr's Hackers Delight. The "three valued compare function" is in chapter 2, for example.

eru

> It is not on the list, but #define CMP(X, Y) (((X) > (Y)) - ((X) < (Y))) is an efficient way to do generic comparisons for things that want UNIX-style comparators. If you compare the output against 0 to check for some form of greater than, less than or equality, the compiler should automatically simplify it. For example, CMP(X, Y) > 0 is simplified to (X > Y) by a compiler.

I guess this only applies when the compiler knows what version of > you are using?

Eg it might not work in C++ when < and > are overloaded for eg strings?

ryao

My comment had been meant for C, but it should apply to C++ too even when operator overloading is used, provided the comparisons are simple and inlined. If you add overloads for the > and < operators in your string example to a place where they would inline, and the overload compares .length(), this should simplify. For example, godbolt shows that CMP(X, Y) == 0 is optimized to one mov instruction and one cmp instruction despite operator overloads when I implement your string example:

https://godbolt.org/z/nGbPhz86q

If you did not inline the operator overloads and had them in another compilation unit, do not expect this to simplify (unless you use LTO).

If you have compound comparators in the operator overloads (such that on equality in one field, it considers a second for a tie breaker), I would not expect it to simplify, although the compiler could surprise me.

null

[deleted]

trollbridge

The compiler would resolve that before the optimiser.

kmoser

There's also this classic: https://en.wikipedia.org/wiki/Fast_inverse_square_root

ryao

That is an approximation. If approximations are acceptable, then here is a trick you might like. In loops that call cosf(i * C) and/or sinf(i * C), where i is incremented by 1 on each iteration and C is some constant expression, you can call cosf() and sinf() once (or twice if i starts at something other than 0 or 1) outside of the loop and use the angle addition formula to do accumulation via multiplication and addition inside the loop. The loop will run significantly faster.

Even if you only need one of cosf() or sinf(), many CPUs calculate both values at the same time, so taking the other is free. If you only need single precision values, you can do this in double precision to avoid much of the errors you would get by doing this in single precision.

This trick can be used to accelerate the RoPE relative positional encoding calculations used in inference for llama 3 and likely others. I have done this and seen a measurable speed up, although these calculations are such a small part of inference that it was a small improvement.

owl_vision

there is "Hacker's Delight" by Henry S. Warren, Jr.

https://en.wikipedia.org/wiki/Hacker's_Delight

qingcharles

Looks awesome, thank you :)

tylerhou

You should look at supercompilation.

mshockwave

sometimes also known as superoptimization, which many of them also use SMT solvers like Z3 mentioned in the article

tylerhou

Yes, sorry, superoptimization is the correct term.

masfuerte

We didn't miss them. In those days they weren't optimizations. Multiplications were really expensive.

JdeBP

Multiplications of this word length, one should clarify. It's not that multiplication was an inherently more expensive or different operation back then (assuming from context here that the "old days" of coding inner loops in assembly language pre-date even the 32-bit ALU era). Binary multiplication has not changed in millennia. Ancient Egyptians were using the same binary integer multiplication logic 5 millennia ago as ALUs do today.

It was that generally the fast hardware multiplication operations in ALUs didn't have very many bits in the register word length, so multiplications of wider words had to be done with library functions that did long multiplication in (say) base 256.

So this code in the headlined article would not be "three instructions" but three calls to internal helper library functions used by the compiler for long-word multiplication, comparison, and bitwise AND; not markedly more optimal than three internal helper function calls for the three original modulo operations, and in fact less optimal than the bit-twiddled modulo-powers-of-2 version found halfway down the headlined article, which would only need check the least significant byte and not call library functions for two of the 32-bit modulo operations.

Bonus points to anyone who remembers the helper function names in Microsoft BASIC's runtime library straight off the top of xyr head. It is probably a good thing that I finally seem to have forgotten them. (-: They all began with "B$" as I recall.

kruador

Most 8-bit CPUs didn't even have a hardware multiply instruction. To multiply on a 6502, for example, or a Z80, you have to add repeatedly. You can multiply by a power of 2 by shifting left, so you can get a bigger result by switching between shifting and adding or subtracting. Although, again, on these earlier CPUs you can only shift by one bit at a time, rather than by a variable number of bits.

There's also the difference between multiplying by a hard-coded value, which can be implemented with shifts and adds, and multiplying two variables, which has to be done with an algorithm.

The 8086 did have multiply instructions, but they were implemented as a loop in the microcode, adding the multiplicand, or not, once for each bit in the multiplier. More at https://www.righto.com/2023/03/8086-multiplication-microcode.... Multiplying by a fixed value using shifts and adds could be faster.

The prototype ARM1 did not have a multiply instruction. The architecture does have a barrel shifter which can shift one of the operands by any number of bits. For a fixed multiplication, it's possible to compute multiplying by a power of two, by (power of two plus 1), or by (power of two minus 1) in a single instruction. The latter is why ARM has both a SUB (subtract) instruction, computing rd := rs1 - Operand2, and a RSB (Reverse SuBtract) instruction, computing rd := Operand2 - rs1. The second operand goes through the barrel shifter, allowing you to write an instruction like 'RSB R0, R1, R1, #4' meaning 'R0 := (R1 << 4) - R1', or in other words '(R1 * 16) - R1', or R1 * 15.

ARMv2 added in MUL and MLA (MuLtiply and Accumulate) instructions. The hardware ARM2 implementation uses a Booth's encoder to multiply 2 bits at a time, taking up to 16 cycles for 32 bits. It can exit early if the remaining bits are all 0s.

Later ARM cores implemented an optional wider multiplier (that's the 'M' in 'ARM7TDMI', for example) that could multiply more bits at a time, therefore executing in fewer cycles. I believe ARM7TDMI was 8-bit, completing in up to 4 cycles (again, offering early exit). Modern ARM cores can do 64-bit multiplies in a single cycle.

eru

> Multiplications of this word length, one should clarify. It's not that multiplication was an inherently more expensive or different operation back then (assuming from context here that the "old days" of coding inner loops in assembly language pre-date even the 32-bit ALU era). Binary multiplication has not changed in millennia. Ancient Egyptians were using the same binary integer multiplication logic 5 millennia ago as ALUs do today.

Well, we can actually multiply long binary numbers asymptotically faster than Ancient Egyptians.

See eg https://en.wikipedia.org/wiki/Karatsuba_algorithm

kens

> Binary multiplication has not changed in millennia. Ancient Egyptians were using the same binary integer multiplication logic 5 millennia ago as ALUs do today.

It turns out that multiplication in modern ALUs is very different. The Pentium, for instance, does multiplication using base-8, not base-2, cutting the number of additions by a factor of 3. It also uses Booth's algorithm, so much of the time it is subtracting, not adding.

godelski

Related, Computerphile had a video a few months ago where they try to put compute time relative to human time, similar to the way one might visualize an atom by making the proton the size of a golfball. I think it can help put some costs into perspective and really show why branching maters as well as the great engineering done to hide some of the slowdowns. But definitely some things are being marked simply by the sheer speed of the clock (like how the small size of a proton hides how empty an atom is)

  https://youtube.com/watch?v=PpaQrzoDW2I

kurthr

and divides were worse. (1 cycle add, 10 cycle mult, 60 cycle div)

genewitch

That's fair but mod is division, or no? So realistically the new magic number version would be faster. Assuming there is 32 bit int support. Sorry, this is above my paygrade.

ryao

Division still is worse:

https://github.com/ridiculousfish/libdivide

qingcharles

Yeah, I'm thinking more of ones that remove all the divs from some crazy math functions for graphics rendering and replace them all with bit shifts or boolean ops.

Someone

And branches were cheaper without pipelining

TacticalCoder

[dead]

null

[deleted]

22c

Part-way through the section on bit-twiddling, I thought to myself "Oh I wonder if we could use a solver here". Lo and behold, I was pleasantly surprised to see the author then take that exact approach. Love the attention to detail in this post!

dndn1

If you need to know a leap year and it's before the year 6000, I made an interactive calculator and visualization [1].

It's >3 machine instructions (and I admire the mathematical tricks included in the post), but it does do thousands of calculations fairly quickly :)

[1] https://calculang.dev/examples-viewer?id=leap-year

ctxc

Please make stuff fit into viewport width, hard to use on mobile as it stands :)

dndn1

Thanks for the prompt! I made a new gallery that's mobile friendly and I almost forgot why I need to land it - adding it to the list!

dahart

Looks like gcc & clang use some of the bit-twiddling tricks when you compile the original function with -O3: https://godbolt.org/z/eshd9axod

    is_leap_year(unsigned int):
            xor     eax, eax
            test    dil, 3
            jne     .L1
            imul    edi, edi, -1030792151
            mov     eax, 1
            mov     edx, edi
            ror     edx, 2
            cmp     edx, 42949672
            ja      .L1
            ror     edi, 4
            cmp     edi, 10737418
            setbe   al
    .L1:
            ret

ryao

They are sometimes very good at using mathematical identities to do simplifications. The following commit was actually inspired by the output of GCC:

https://github.com/openzfs/spl/commit/8fc851b7b5315c9cae9255...

Jason had noticed that GCC’s assembly output did not match the original macro when looking for a solution to the unsigned integer overflow warning that a PaX GCC plugin had output (erroneously in my opinion). He had conjectured we could safely adopt GCC’s version as a workaround. I gave him the proof of correctness for the commit message and it was accepted into ZFS. As you can see from the proof, deriving that from the original required 4 steps. I assume that GCC had gone through a similar process to derive its output.

nullc

There are many cute binary/logic tricks, if you like them be sure to read Hackers Delight and https://graphics.stanford.edu/~seander/bithacks.html . Once you've studied enough of them you'll find yourself easily coming up with more.

Warning: This may increase or decrease your popularity with fellow programmers, depending on how lucky you are in encountering problems where they make an important performance difference rather than a readability problem for people who have not deeply internalized bit twiddling.

Multiply and mask for varrious purposes is a thing I commonly use in my own code-- it's much more attractive now that it was decades ago because almost all computers we target these days have extremely fast multipliers.

These full-with logic operations and multipliers give you kind of a very parallel computer packed into a single instruction. The only problem is that it's a little tricky to program. :)

At least this one was easy to explain mechanically. Some bit hacks require p-adic numbers and other elements of number theory to explain.

fuzunoglu

Taking a look at numbers in binary reveals some interesting patterns. Although seems obvious, it was interesting to me when I realized that all prime numbers except 2 end with 1.

silisili

Not trying to be a jerk, but why is that interesting? Am I missing something more than all odd numbers end in 1, and primes by their nature cannot be even(except 2, as you mentioned).

lkirkwood

Just nitpick but all odd numbers end in an odd number, not 1, and all even numbers end in an even number i.e. a multiple of 2.

silisili

Sure, but OP was talking about binary representation.

Etheryte

All odd numbers end in 1 in binary.

npendleton

This is so cool!

Terrible nitpick, but this is actually 3 operations, not instructions. On x86 you get 4:

  is_leap_year_fast:
        imul    eax, edi, 1073750999
        and     eax, -1073614833
        cmp     eax, 126977
        setb    al
        ret

On ARM you get a bit more due to instruction encoding:

  is_leap_year_fast:
        ldr     r1, .LCPI0_0
        mul     r0, r0, r1
        ldr     r1, .LCPI0_1
        and     r1, r0, r1
        mov     r0, #0
        cmp     r1, #126976
        movwls  r0, #1
        bx      lr
  .LCPI0_0:
        .long   1073750999
  .LCPI0_1:
        .long   3221352463

Compiler explorer reference: https://godbolt.org/z/7ajYqbT9z

gpderetta

You could argue that the setb and ret are not part of the leap year check itself. For example if the compiled inlined the call into a caller doing:

    if(is_leap_year_fast()) {...}

Then the ret would obviously go away and the setb wouldn't be necessary as it could generate directly a conditional jmp from the result of the cmp.

npendleton

Hah, great point!

ReptileMan

Somewhat relevant and related.

>“So, it’s a bug in Lotus 123?”

>“Yeah, but probably an intentional one. Lotus had to fit in 640K. That’s not a lot of memory. If you ignore 1900, you can figure out if a given year is a leap year just by looking to see if the rightmost two bits are zero. That’s really fast and easy. The Lotus guys probably figured it didn’t matter to be wrong for those two months way in the past. It looks like the Basic guys wanted to be anal about those two months, so they moved the epoch one day back.”

https://www.joelonsoftware.com/2006/06/16/my-first-billg-rev...

xrisk

That was a great read. Thanks for sharing!

usr1106

Interesting. In one place the author argues: 0 is missing, but we already know...

The is no year 0, it goes 1 BC, 1 AD. So testing whether 0 is a leap year is moot.

skissane

> The is no year 0, it goes 1 BC, 1 AD. So testing whether 0 is a leap year is moot.

Not true if you use astronomical year numbering: https://en.m.wikipedia.org/wiki/Astronomical_year_numbering

Which is arguably the right thing to do outside of specific domains (such as history) in which BCE is entrenched

If your software really has to display years in BCE, I think the cleanest way is store it as astronomical year numbering internally, then convert to CE/BCE on output

rf15

> Astronomers use the Julian calendar for years before 1582, including the year 0, and the Gregorian calendar for years after 1582

So what happens when it's 1582? (sorry, currently no time to articulate a good wiki fix)

skissane

I think they use the original Gregorian cutover, in which 1582-10-04 is followed by 1582-10-15, and the dates 1582-10-05 through 1582-10-14 don’t exist.

However, in general, I think proleptic Gregorian is simpler. But in astronomy do what the astronomers do. And in history, dates between 1582 and 1923 (inclusive), you really need to explicitly mark the date as Gregorian or Julian, or have contextual information (such as the country) to determine which one to use.

1923 because that was when Greece switched from Julian to Gregorian, the last country to officially do so. Although various other countries in the Middle East and Asia adopted the Gregorian calendar more recently than 1923 - e.g. Saudi Arabia switched from the Islamic calendar to the Gregorian for most commercial purposes in 2016, and for most government purposes in 2023 - those later adoptions aren’t relevant to Julian-Gregorian cutover since they weren’t moving from Julian to Gregorian, they were moving from something non-Western to Gregorian

Large chunks of the Eastern Orthodox Church still use the Julian calendar for religious purposes; other parts theoretically use a calendar called “Revised Julian” which is identical to Gregorian until 2800 and different thereafter - although I wonder if humanity (and those churches) are still around in 2800, will they actually deviate from Gregorian at that point, or will they decide not to after all, or forget that they were officially supposed to

JdeBP

Go back to the start of the article, and you'll find that using the proleptic Gregorian calendar with astronomical year numbering is a premise for the algorithm.

Without that design constraint, testing for leap years becomes locale-dependent and very complex indeed.

timewizard

ISO8601 accepts year 0. It is 1 BC in astronomical calendars. All the BC years gain a -1 offset as a result.

usr1106

Interesting, how standards just ignore reality.

At work we had discussions what date format to use in our product. It's for trained users only (but not IT people), English UI only, but used on several continents. Our regulatory expert propsed ISO8601. I did not agree, because that is not used anywhere in daily life except by 8 millions Swedes. I voted 15-Apr-2025 is much less prone to human error. (None of us "won". Different formats in different places still...)

deredede

> that is not used anywhere in daily life

Does it matter? MM-DD-YYYY is used in America and makes DD-MM-YYYY ambiguous, but as far as I know nobody uses YYYY-DD-MM, so ISO8601 should be perfectly fine, especially if users are trained. Besides, if you're not used to it, starting with the year forces you to think, which is desirable if you want to avoid human error.

nmehner

https://listverse.com/2019/05/19/10-bizarre-calendar-fixes-t...

Everything before the introduction of the gregorian calendar is moot:

"In 1582, the pope suggested that the whole of Europe skip ten days to be in sync with the new calendar. Several religious European kingdoms obeyed and jumped from October 4 to October 15."

So you cannot use any date recorded before that time for calculations.

And before that it gets even more random:

"The priests’ observations of the lunar cycles were not accurate. They also deliberately avoided leap years over superstitions. Things got worse when they started receiving bribes to declare a year longer or shorter than necessary. Some years were so long that an extra month called Intercalaris or Mercedonius was added."

usr1106

Before 1582 the rule is just simpler. If it is divisible by 4 it's a leap year. So the difference is relevant for years 300, 500, 600, 700, 900 etc. For ranges spanning those years the Gregorian algorithm would result in results not matching reality.

When the Julian calendar was really adopted I don't know. Certainly not 0001-01-01. And of course it varies by country like Gregorian.

pbhjpbhj

From Wikipedia:

>The Julian calendar was proposed in 46 BC by (and takes its name from) Julius Caesar, as a reform of the earlier Roman calendar, which was largely a lunisolar one.[2] It took effect on 1 January 45 BC, by his edict.

Not knowing the year seems unhinged somehow.

NelsonMinar

It's rare to read code that makes me literally laugh out loud. What a delight.

HN

A leap year check in three instructions

A leap year check in three instructions