The Fifth Kind of Optimisation
17 comments
·April 2, 2025Agingcoder
PaulKeeble
Also buy a bigger box or better IO or both. A lot of issues are solved by scaling something up. Its not more optimal but it does solve the performance problem and is often cheaper than months of tracing performance and development work.
jiggawatts
I’ve lost my voice for a few days because I spent a solid eight hours in a row talking developers out of solving already-solved problems.
“No, you don’t need Redis for caching output, we have a CDN already.”
“No, you don’t need to invent authentication tokens yourself, JWT is a downloadable package away.”
“No…”
“No…”
…
ninetyninenine
Same. I have a team lead who's trying to externalize a SQL query to new infrastructure. Why can't we do a database call on the current machine? why does this have to be a task for an external machine to carry out???
It's like the current machine can't wait on the database, we have to wait on an external machine that waits on the database. Why why why? I feel web developers are so used to these patterns that they lose common sense.
I'm sure tons of people will agree with the team lead. I think a lot of people just don't get it.
jiggawatts
Oh god, this.
The best part is that by externalising querying, it becomes fiddly to support every possible query shape… so…
… inevitably:
public object GetSQL( string query ) …
SQL injection is the inevitable end result of this design pattern.tanelpoder
Also “do it in advance (and reuse)”
null
YZF
Every large scale cloud/SaaS software has been using parallelism for decades [well, almost two decades give or take]. Requests run in parallel on millions of servers and one server processes many requests in parallel.
In that sort of setup the old school optimize one program by doing things in parallel is somewhat less relevant because everything by its nature is massively parallel and complex operations are often broken down into bits that all run in parallel. In that sort of environment over-parallelizing can actually be a pessimization because there is often some cost around breaking the work up to run in parallel and your little bit of this massively parallel system may have some fraction of resources anyways.
Not to say there's no value in knowing "old school" parallelizing (including using SIMD or GPUs or whatnot) but for many types of applications that's not where I'd focus.
kragen
Somehow his taxonomy of optimizations doesn't have a category for things like "unroll a loop", "fuse loops", "block your array accesses", "avoid false sharing", or "strength-reduce your operations". He has a "use a better algorithm" category, but his example is switching from bubble sort to selection sort, which is a much more disruptive kind of change than such micro-optimizations, which are usually considered to be "the same algorithm".
This is rather puzzling, given that he starts off with, "Premature optimisation might be the root of all evil, but", quoting Knuth, and in context, Knuth was primarily talking about precisely such micro-optimizations: https://dl.acm.org/doi/10.1145/356635.356640 p. 267 (p. 7/41)
Specifically, the optimizations Knuth was exhibiting in that section of the paper are:
- speeding up a sequential array search by copying the search key to an empty spot at the end of the array, so that the loop only requires one termination test (are keys equal?) rather than two (are keys equal? is index in bounds?), speeding it up on typical machines of the time by 50%;
- unrolling that version of the loop 2× in a way that required gotos, speeding it up by a further 12%.
None of the optimization techniques being applied there fit into any of Tratt’s categories!
Now, I can't claim to be any sort of optimization expert, but I think these kinds of micro-optimizations are still important, though, as Knuth pointed out at the time, not for all code, and often the drawbacks are not worth it. Sometimes compilers will save you. The C compiler might unroll a loop for you, for example. But often it won't, and it certainly isn't going to pull something like the sentinel trick, because it doesn't know you aren't reading the array in another thread.
This sort of depends on what context you're working in; arguably if your hotspots are in Python, even (as Tratt says) in PyPy, you have bigger opportunities for optimization available than manually unrolling a loop to reduce loop termination tests. But I've found that there's often something available in that sort of grungy tweaking; I've frequently been able to double the speed of Numpy code, for example, by specifying the output array for each Numpy operation, thus reducing the cache footprint. The code looks like shit afterwards, sadly.
hinkley
A bunch of these dates are way wrong because the author is stuck in an x86 consumer grade mindset. Even if we didn’t have multicore on our desktops a lot of us had shipped on multicore long before 2014 which is when he marks the era as beginning.
Multiprocessor was already an established solution. I worked on 4 core machines around 2008. As would anyone who worked on server software instead of just desktop.
YZF
We were also doing multi-threading before we had multiple cores. I remember some things breaking once we actually started seeing the first dual-core machines, probably Pentium-D circa 2005.
Massively parallel machines and parallelization predates that by a lot: https://en.wikipedia.org/wiki/Connection_Machine
kragen
Yeah, I was working on a Windows NT image processing program in 01998. My desktop had two processors, and the machines in the dev lab and in the field had four. Mine were Pentium Pro. Some friends of mine had accounts on pioneer.unm.edu, a Sequent Symmetry with I think 8 80386 processors on a single memory bus, up until it was lost or stolen in about 01994. Then I got an account on the MHPCC SP2 UNM had the contract to administer, with I think 384 RS/6000 POWER CPUs, but without shared memory (not even NUMA).
hinkley
My first Unix account was on a Sequent. I tried to put together a dual processor box in 1995 but they were having supply issues and it didn't work out.
I think it is important to differentiate multi core from multiprocessor, but not so much for the context of that article.
shoo
Another kind of optimisation is to see if you are solving an unnecessarily general problem, which could be replaced by a narrower problem specific to your actual situation & workload, that might be easier to solve.
see also: Mike Acton's 2014 CppCon talk "Data-Oriented Design and C++": https://www.youtube.com/watch?v=rX0ItVEVjHc
ninetyninenine
The title implies that the 5th kind of thing is revolutionary and unknown. I mean yeah? You parallelize a task that can be parallelized it's faster...
I thought it was going to be something that was unexpected. But alas that's happening less and less on HN.
infogulch
0: Make your requirements less dumb.
ninetyninenine
Let's be real he's over blowing the novelty of parallelism. We know what parallelism is. The real "5th optimization" is quantum computing.
There’s also ‘don’t do it’ ( just turn the whole thing off because it’s actually not needed ) and ‘use a better middleman’ ( os, compiler etc while not going lower level). In my experience these are very cheap and usually not looked at and most devs jump at the other 5.