OOMProf: Profiling on the Brink
4 comments
·August 24, 2025xmichael909
OOMKiller... in most cases where it has killed things, i feel it would’ve been far better to just let the system slog along and spill onto disk instead of killing the process outright, or as the article says killing the wrong process like it always seems to do.
devin
I find OOM Kills are frequently one of the most misunderstood ops issues. I've seen it frequently get treated as "This application ran out of memory". I think that "OOM Killed" is a bad way of describing clearly to a user debugging an issue what is happening the first place, never mind the rabbit hole of investigation that can follow it.
Bender
I've not seen OOM on any of my systems for a very long time but I also set overcommit ratio to 0 and vm.min_free_kbytes to a higher number based on a formula. I have not allocated swap in a couple decades even in tiny VM's at VPS providers. If memory gets tight I move apps to a node with more memory and leave plenty free for inode/dentry/page cache. Unused RAM is never wasted.
default linux vm settings are abysmal. ratios are all wrong, overcommit is... questionable, cache flushes happen too late. my worst experiences are when there's a shitton of free memory and my processes get OOM killed, and not even due to fragmentation.