Show HN: I audited 500 K8s pods. Java wastes ~48% RAM, Go ~18%

38 comments

·December 13, 2025

dantillberg

This has to be malware in poor disguise.

Curl-bash without hash-checking from a four-month-old domain with full k8s cluster access? All the identities connected are brand new, including the brand-new HN account that posted this. There are four commits in the repo, and three are back-dated to exactly 1 year ago.

unsnap_biceps

And it's possible to serve different content if the curl is being piped to a shell vs if it's being piped to a file or stdout

https://github.com/Stijn-K/curlbash_detect

wozzio

The curl | bash is just for convenience; the README explicitly advises to Download and inspect wozz.sh first if you aren't comfortable piping to shell.

As for the newness I just open-sourced this from my personal scripts collection this week, so yes, the Org and Account are new. It runs entirely locally using your active kubeconfig it doesn't collect credentials or send secrets anywhere. You can cat the script to verify that it's just a wrapper around kubectl top and kubectl get.

karianna

Usual caveats of “you should run load to see if it’s truly wasted”, but we do know that Java defaults are not ideal out of the box and so we analysed a ton of workloads on Azure and came up with this: https://learn.microsoft.com/en-us/java/jaz - better defaults out of the box for the JVM in containers at runtime.

wozzio

That doc is gold thanks for linking.

Yeah defaults are the enemy here. Most of the waste I'm seeing in the data comes from generic Spring Boot apps running with out of the box settings where the JVM assumes it owns the entire node.

ice3

No source, useless outside of azure.

jeroenhd

You can just install it outside of Azure: https://learn.microsoft.com/en-us/java/jaz/overview#install-...

Note: the tool calls out to Microsoft but the docs don't say what data is shared ("telemetry"), better make sure your firewalls are functioning before running this tool.

Not sure what the source license story is.

nyrikki

IMHO there are some serious problems here that won't relate to many situations, and is not really "waste" in the way claimed and will actually probably result in greater spends.

> Memory waste: request - actual usage [0]

Memory "requests" are hints to the kube-scheduler for placement, not a target for expected usage.

> # Memory over-provisioning: limit > 2x request [1]

Memory limits are for enforcement, typically when to call the OOM killer

Niether placement nor oomkilling limits should have anything to do with normal operating parameters.

> The memory request is mainly used during (Kubernetes) Pod scheduling. On a node that uses cgroups v2, the container runtime might use the memory request as a hint to set memory.min and memory.lo [2]

By choosing to label the delta between these two as "waste" you will absolutely suffer from Goodhart's law and you will teach your dev team to not just request, but allocate memory and don't free it so that they can fit inside this invalid metric's assumptions.

It is going to work against the more reasonable goals of having developers set their limits as low as possible without negative effects, while also protecting the node and pod from memory leaks, while still gaining the advantages of over-provisioning, which is where the big gains are to be made.

[0] https://github.com/WozzHQ/wozz/blob/main/scripts/wozz-audit.... [1] https://github.com/WozzHQ/wozz/blob/main/scripts/wozz-audit.... [2] https://kubernetes.io/docs/concepts/configuration/manage-res...

wozzio

you are technically right that requests are scheduling hints but in a cluster autoscaler world, requests=bill.

If I request 8GB for a pod that uses 1GB, the autoscaler spins up nodes to accommodate that 8GB reservation. That 7GB gap is capacity the company is paying for but cannot use for other workloads.

Valid point on Goodhart's Law, tho the goal shouldn't be fill the RAM, but rather lower the request to match the working set so we can bin-pack tighter.

perrygeo

I'm not sure I would frame unused memory as "waste". At least not necessarily. If the system is operating below max capacity, that memory isn't wasted - it's a reserve capacity to handle surges. The behavior of the system under load (when it matters) might very well depend on that "wasted" memory.

You want to compare the _maximum_ memory under load - the high-water mark. If you combine Wozz with a benchmark to drive heavy traffic, you could more confidently say any unused memory was truly wasted.

arjie

I often end up over-provisioning because it's got a cliff effect. If I under-provision, I end up losing the entire program and its work. If I over-provision, I see a smoother increase in running cost. An amusing note:

    curl -o wozz-audit.sh https://wozz.io/audit.sh
    cat wozz-audit.sh
    bash wozz-audit.sh

That definitely doesn't work on `curl 8.7.1 (x86_64-apple-darwin25.0) libcurl/8.7.1 (SecureTransport)` because you return a 307 with 'Redirecting...' in the body so `wozz-audit.sh` is just the string 'Redirecting...'. I have to pass curl the `-L` flag to follow the redirect. Then everything works.

xav0989

Curl always requires -L to follow redirects.

wozzio

I've been consulting on EKS/GKE cost optimization for a few mid-sized companies and kept seeing the same pattern: massive over-provisioning of memory just to be safe.

I wrote a simple CLI tool (bash wrapper around kubectl) to automate diffing kubectl top metrics against the declared requests in the deployment YAML.

I ran it across ~500 pods in production. The "waste" (allocated vs. used) average by language was interesting:

Python: ~60% waste (Mostly sized for startup spikes, then idles empty).

Java: ~48% waste (Devs seem terrified to give the JVM less than 4Gi).

Go: ~18% waste.

The tool is called Wozz. It runs locally, installs no agents, and just uses your current kubecontext to find the gap between what you pay for (Requests) and what you use (Usage).

It's open source. Feedback welcome.

(Note: The install is curl | bash for convenience, but the script is readable if you want to audit it first).

cogman10

Actually, this sounds like your java devs have misconfigured containers.

Java will happily eat all the memory you throw at it. The fact that it isn't means they are probably relying on the default JVM settings. Those are too conservative inside a container, especially if they are running on older JVM versions.

What you'll find is the JVM will constantly have "waste" due to it's default configuration. The question is if it ends up throwing out of memory exceptions due to the lower memory limit.

If your devs haven't looked at the heap allocation of a running application, there's no way to know if this is too much or too little memory.

Go/python much more readily give memory back to the OS.

wozzio

The JVM expands to fill the container, but the scheduler still counts that 8GB request as used when packing the node. Even if the app only needs 2GB of working set, we are blocked from scheduling other pods on that wasted 6GB buffer.

karolinepauls

> Python: ~60% waste (Mostly sized for startup spikes, then idles empty).

I understand we're talking about CPU in case of Python and memory for Java and Go. While anxious overprovisioning of memory is understandable, doing the same for CPU probably means lack of understanding of the difference between CPU limits and CPU requests.

Since I've been out of DevOps for a few years, is there ever a reason not to give each container the ability to spike up to 100% of 1 core? Scheduling of mass container startup should be a solved problem by now.

cogman10

I don't think there is. You should set both and limit doesn't need to match request for CPU.

Your limit should roughly be "what should this application use if it goes full bore" and your request should be "what does this use at steady state".

At least at my company, the cluster barely uses any CPU even during the busiest hours. We are fairly over provisioned there because a lot of devs are keeping limit and request the same.

wozzio

CPU bursting is safe you just get throttled. Memory bursting is dangerous you get OOMKilled.

That's why Python numbers look so bad here devs set the request high enough to cover that initial model loading spike so they don't crash during a rollout, even if they idle at 10% usage afterwards.

Narishma

What makes you think they're talking about CPU? It reads to me like it's memory.

Marazan

Interesting, at the company I work for Java pods are (historically) over provisioned with CPU but quite tightly provisioned with memory

wozzio

That is the opposite of what I usually see. Are you trading CPU for RAM by running a more aggressive GC like ZGC or Shenandoah? Usually, people starve the CPU to buy more RAM.

dboreham

Typically they're overprovisioning through prior experience. In the past something fell over because it didn't have enough memory. So they gave it more. That practice stuck in the brain-model. Perhaps it's no longer valid, but who wants to bring down the service doing Chernobyl experiments?

PaulKeeble

You run these tools and you find all the maximums for weeks of traffic and so you set them down to minimise cost and all is well until the event. The event doesn't really matter, it causes an increase in traffic processing time and suddenly every service needs more memory to hold all transactions, now instead they fail with out of memory and disappear and suddenly all your pods are in restart loops unable to cope and you have an outage.

The company wasting 20% extra memory on the other hand is still selling and copes with the slower transaction speed just fine.

Not sure over provisioning memory is really just waste when we have dynamic memory based languages, which is all modern languages not in real time safety critical environments.

gopher_space

> Perhaps it's no longer valid, but who wants to bring down the service

I'm thinking more like getting a junior to do efficiency passes on a year's worth of data.

wozzio

Exactly we call it sleep insurance. It is rational for the on call engineer to pad the numbers but it's just irrational for the finance team to pay for it.

pestatije

is this an instantaneous measure or goes over the whole duration of the process?

stefan_

4GiB oh no we gave it Raspberry Pi memory?

This is truly one of the dumbest outcomes of the whole "resume driven cloud deployment" era. Clouds "market segmenting" with laughable memory, resume engineers want their line but not the recurring cost, engineers waste weeks investigating and working around out of memory issues that are purely some guy provisioning services with 2003 levels of memory. It's all a giant waste of time.

rvz

> I've been consulting on EKS/GKE cost optimization for a few mid-sized companies and kept seeing the same pattern: massive over-provisioning of memory just to be safe.

Correct.

Developers keep over-provisioning as they need enough memory for the app to continue running as demand scales up. Since these languages have their own runtimes and GCs to manage memory, it already pre-allocates lots of RAM before running the app; adding to the bloat.

Part of the problem is not only a technical one (the language may be bloated and inefficient) but it is completely psychological as developers are scared of their app getting an out-of-memory exception in production.

As you can see, the languages with the most waste are the ones that are inefficient both runtime (speed) and space (memory) complexity and take up the most memory and are slower (Python, and Java) and costs a lot of money to continue maintaining them.

I got downvoted over questioning the microservice cargo cult [0] with Java being the darling of that cult. If you imagine a K8s cluster with any of these runtimes, you can see which one will cost the most as you scale up with demand + provisioning.

Languages like Go, and Rust are the clear winners if you want to save lots of money and are looking for efficiency.

[0] https://news.ycombinator.com/item?id=44950060

seabrookmx

Neat. Might have to try this on .NET, especially since v10 just got a new (default) GC that claims to use a lot less memory at idle.

wozzio

That new .NET behavior is the goal smart runtimes that yield memory back to the OS so we don't have to play guess the request in YAML.

Unfortunately most legacy Java/Python workloads I see in the wild are doing the exact opposite: hoarding RAM just in case. Until the runtimes get smarter, we're stuck fixing the configs.

BatteryMountain

Its not just idle, it is a bit more aggressive on cleaning up sooner after objects are de-referenced, meaning the OS gets the memory back sooner. Very useful in situation where you have many small containers/vm's running dotnet stuff, or on large applications where memory pressure is an issue and your usage pattern can benefit from memory releasing earlier.

In the old days you could tune IIS + Garbage collector manually to get similar behaviour but was usually not worth it. Time was better spent elsewhere to optimize other things in the pipe and live with the GC freezes. I suspect GC hickups should now be much smaller/shorter too, as the overall load will be lower with new GC.

wozzio

That aggressive release behavior is exactly what we need more of most runtimes (looking at you legacy Java) just hoard the heap forever because they assume they are the only tenant on the server.

linuxftw

Let's do the math:

A 64GB instance in Azure is $156.28/month reserved. That's $2.44/GB/month.

Let's say you use an extra 4GB RAM for a safety margin or laziness, and you have 50 replicas. That's $488.375 per month 'wasted'. Or $5860/year.

You'll never recoup the money it takes to get those replicas perfectly sized. Just give every app 4GB more RAM than you think it needs, and move on with your life.

wozzio

If you're small startup burning $5k/year in lazy tax is smarter than distracting your engineers the math flips when you hit scale.

For some that safety margin isn't small. At a certain volume, waste exceeds the cost of a full-time engineer so optimizing becomes profit.

dev_l1x_be

How much is the k8s waste?

wozzio

In the clusters I've audited so far typically 30% to 50% of the total requested capacity is allocatable waste resources that are reserved/billed but never actually touched.

dboreham

A tool that claims to solve a clearly unsolvable problem!

wozzio

It feels unsolvable because we've been trying to solve it with static spreadsheets and guesses. It's actually very solvable if you treat it as a control loop problem continuous adjustment rather than a set it and forget it config.