Optimizations That Aren't

kccqzy

> Measure the performance of the target code in a specific situation

A difficult part of optimization is actually trying to make the code work well in multiple specific situations. This often happens in library code where different users call your code with very different sizes of inputs. Sometimes a dumb algorithm works better. Sometimes a fancier algorithm with better big-O but bigger constant factors works better. In practice people try to measure them according to the input size and dynamically choose the algorithm based on the size. This has the pitfall of the heuristic not keeping up with hardware. It also becomes intractable if the performance characteristics depend on multiple factors, then it's trying to encode the minimum in a multi-dimensional space. This work involved in optimization is just exhausting.

addaon

The other approach here is to provide access to the multiple implementations, documentation as to the (main) sensitivities for their performance, and let the caller do their own benchmarking to select the right one, based on the specific situations they care about. It's a bit of kicking the can down the road, but it's also a bit of allowing your customers (at least the ones who care) to get the best results possible.

pnt12

I did some work in this area, concerning data pipelines, and it was a fun experience.

It's really satisfying to optimize (or any kind of refactor) on well tested code. Change the code, run the test, fix if it fails, keep it if it passes. Sometimes the code was not well tested, but it was slow, so there was double the reason to test and improve.

Having deterministic data for comparison is also good in a different perspective: slower feedback loop, but usually more variety, with edge cases you didn't think of. Transforming thousands of data points and getting 0 diffs compared to the original results is quite the sanity check!

Measuring can be difficult but really rewarding. I was doing this very technical work, but constantly writing reports on the outcomes (tables and later plots) and got great feedback from managers/clients, not only about the good results (when they happened, not always!) but also about the transparency and critical analysis.

We didn't really work with acceptance levels though. It was usually "this is slow now, and we expect more data later, so it must be faster". But it makes sense to define concrete acceptance criteria, it's just not always obvious. We'd go more in terms of priorities: explore the slow parts, come up with hypothesis, chase the most promising ones, depending on risk/reward. Easy fixes for quick wins, long stretches for potential big gains - but try to prototype first to validate before going on long efforts that may be fruitless.

taeric

Point 4 really resonates with me. And it often lends itself with the idea of a budget. Both in terms of speed and memory. How much memory do you have at a given spot of the application? How much time? Can you meaningfully make use of any savings.

Sometimes, you will find slack in unexpected places, as well. Places that have extra time compared to what they used. Or, more common, things that could have used more memory. It is amazing what you can do with extra memory. (Indeed, I think the majority of algorithmic advances that people love to talk about come from using extra memory?)

HN

Optimizations That Aren't

Optimizations That Aren't