The Impossible Optimization, and the Metaprogramming to Achieve It
9 comments
·October 28, 2025omnicognate
The language here is Mojo, which the article seems to assume you know and doesn't say enough for you to deduce until half way through and after multiple code examples. I don't know how you're supposed to know this as even the blog it's on is mostly about Vale. From the intro I was expecting it to be about C++.
totalperspectiv
The author works for Modular. He shared the write up on the Mojo Discord. I think Mojo users were the intended audience.
Xcelerate
> Eliminate redundant matrix operations (like two transposes next to each other)
In 2016, I was trying to construct orthogonal irreducible matrix representations of various groups (“irreps”). The problem was that most of the papers describing how to construct these matrices used a recursive approach that depended on having already constructed the matrix elements of a lower dimensional irrep. Thus the irrep dimension n became quite an annoying parameter, and function calls were very slow because you had to construct the irrep for each new group element from the ground up on every single call.
I ended up using Julia’s @generated functions to dynamically create new versions of the matrix construction code for each distinct value of n for each type of group. So essentially it would generate “unrolled” code on the fly and then use LLVM to compile that a single time, after which all successive calls for a specific group and irrep dimension were extremely fast. Was really quite cool. The only downside was that you couldn’t generate very high dimensional irreps because LLVM would begin to struggle with the sheer volume of code it needed to compile, but for my project at the time that wasn’t much of a concern.
taeric
Gave me a smile to see the shout out to LISP in there.
Reading this take on it, it feels like a JIT compiler could also accomplish a fair bit of this? I'm also reminded of the way a lot of older programs would generate tables during build time. I'm assuming that is still fairly common?
fragmede
Depending on the userbase of the site, simply checking for @gmail.com at the end, I'd bet, would result in a quick win, as well as restricting the username's alphabet to allowed Gmail characters.
The other optimization I'd guess at would be to async/thread/process the checking before and after the @ symbol, so they can run in parallel (ish). Extra cpu time, but speed > CPU cycle count for this benchmark.
gus_massa
[Rehashing an old comment]
In the math department, we had a Moodle the students in the first year of my university in Argentina.
When we started like 15 years ago, the emails of the students and TA were evenly split in 30% Gmail, 30% Yahoo!, 30% Hotmail and 10% others (very aproxímate numbers).
Now the students have like 80% Gmail, 10% Live/Outlook/Hotmail and 10% others/Yahoo. Some of the TA are much older, so perhaps "only" 50% use Gmail.
The difference is huge. I blame the mandatory gmail account for the cell phone.
So, checking only @gmail.com is too strict, but a first fast check for @gmail.com and later the complete regex may improve the speed a lot in the real word.
embedding-shape
Tell us you used to work at Google, without telling us.
"simply do X" is such a programmer fallacy at this point I'm surprised we don't have a catchy name for it yet, together with a XKCD for making the point extra clear.
> Mojo, D, Nim, and Zig can do it, and C++ as of C++20. There are likely some other languages that can do it, but these are the only ones that can truly run normal run-time code at compile time
Pretty sure Julia can do it.