How We Found 7 TiB of Memory Just Sitting Around

Aeolun

I read this and I have to wonder, did anyone ever think it was reasonable that a cluster that apparently needed only 120gb of memory was consuming 1.2TB just for logging (or whatever vector does)

shanemhansen

The unreasonable effectiveness of profiling and digging deep strikes again.

hinkley

The biggest tool in the performance toolbox is stubbornness. Without it all the mechanical sympathy in the world will go unexploited.

There’s about a factor of 3 improvement that can be made to most code after the profiler has given up. That probably means there are better profilers than could be written, but in 20 years of having them I’ve only seen 2 that tried. Sadly I think flame graphs made profiling more accessible to the unmotivated but didn’t actually improve overall results.

Negitivefrags

I think the biggest tool is higher expectations. Most programmers really haven't come to grips with the idea that computers are fast.

If you see a database query that takes 1 hour to run, and only touches a few gb of data, you should be thinking "Well nvme bandwidth is multiple gigabytes per second, why can't it run in 1 second or less?"

The idea that anyone would accept a request to a website taking longer than 30ms, (the time it takes for a game to render it's entire world including both the CPU and GPU parts at 60fps) is insane, and nobody should really accept it, but we commonly do.

azornathogron

Pedantic nit: At 60 fps the per frame time is 16.66... ms, not 30 ms. Having said that a lot of games run at 30 fps, or run different parts of their logic at different frequencies, or do other tricks that mean there isn't exactly one FPS rate that the thing is running at.

hinkley

Lowered expectations are come in part from people giving up on theirs. Accepting versus pushing back.

javier2

its also about cost. My game computer has 8 cores + 1 expensive gpu + 32GB ram for me alone. We dont have that per customer.

zahlman

> The biggest tool in the performance toolbox is stubbornness. Without it all the mechanical sympathy in the world will go unexploited.

The sympathy is also needed. Problems aren't found when people don't care, or consider the current performance acceptable.

> There’s about a factor of 3 improvement that can be made to most code after the profiler has given up. That probably means there are better profilers than could be written, but in 20 years of having them I’ve only seen 2 that tried.

It's hard for profilers to identify slowdowns that are due to the architecture. Making the function do less work to get its result feels different from determining that the function's result is unnecessary.

hinkley

Architecture, cache eviction, memory bandwidth, thermal throttling.

All of which have gotten perhaps an order of magnitude worse in the time since I started on this theory.

jesse__

Broadly agree.

I'm curious, what're the profilers you know of that tried to be better? I have a little homebrew game engine with an integrated profiler that I'm always looking for ideas to make more effective.

hinkley

Clinic.js tried and lost steam. I have a recollection of a profiler called JProfiler that represented space and time as a graph, but also a recollection they went under. And there is a company selling a product of that name that has been around since that time, but doesn’t quite look how I recalled and so I don’t know if I was mistaken about their demise or I’ve swapped product names in my brain. It was 20 years ago which is a long time for mush to happen.

The common element between attempts is new visualizations. And like drawing a projection of an object in a mechanical engineering drawing, there is no one projection that contains the entire description of the problem. You need to present several and let brain synthesize the data missing in each individual projection into an accurate model.

nitinreddy88

The other way to look is why adding NS label is causing so much memory footprint in Kubernetes. Shouldn't be fixing that (could be much bigger design change), will benefit whole Kube community?

bstack

Author here: yeah that's a good point. tbh I was mostly unfamiliar with Vector so I took the shortest path to the goal but that could be interesting followup. It does seem like there's a lot of bytes per namespace!

hinkley

Keys require O(logn) space per key or nlogn for the entire data set, simply to avoid key collisions. But human friendly key spaces grow much, much faster and I don’t think many people have looked too hard at that.

There were recent changes to the NodeJS Prometheus client that eliminates tag names from the keys used for storing the tag cardinality for metrics. The memory savings wasn’t reported but the cpu savings for recording data points was over 1/3. And about twice that when applied to the aggregation logic.

Lookups are rarely O(1), even in hash tables.

I wonder if there’s a general solution for keeping names concise without triggering transposition or reading comprehension errors. And what the space complexity is of such an algorithm.

HN

How We Found 7 TiB of Memory Just Sitting Around

How We Found 7 TiB of Memory Just Sitting Around