Skip to content(if available)orjump to list(if available)

Atop 2.11 heap problems

Atop 2.11 heap problems

81 comments

·March 29, 2025

ianbutler

Hey guys we commented on another thread from a few days ago about our tool Bismuth finding the bug (along with a sha of our reproducer script for proof) https://news.ycombinator.com/item?id=43489944

After disclosing and having correspondence with Gerlof and from his above post it looks like we did in fact nail it and I've just shared our write up on how we got it.

HN post detailing how we got it: https://news.ycombinator.com/item?id=43519522

Edit: Here's our reproducer and we've added it to the post too: https://gist.github.com/kallsyms/3acdf857ccc5c9fbaae7ed823be...

hannob

> HN post detailing how we got it: https://news.ycombinator.com/item?id=43519522

I don't see any details there. Is there some link missing here, or is it the wrong link?

I'd be interested to read how your tool found it.

stavros

It's just "we asked our LLM and it found the bug", as I understand it.

saagarjha

What is that a hash of?

ianbutler

As noted, our reproducer script

saagarjha

Right, but where’s the script?

geerlingguy

This doesn't seem nearly as nefarious as the post from earlier this week indicated... I had expected a full supply chain compromise or something that bad based on the earlier post.

barotalomey

Yea, my first thought was this is a unrelated find because eyeballs since the recent focus.

f33d5173

Yeah being taciturn was really the worst thing you could do

echoangle

Related:

"You might want to stop running atop" - https://news.ycombinator.com/item?id=43477057

"Problems with the heap" - https://news.ycombinator.com/item?id=43485980

dang

Thanks! Macroexpanded:

Problems with the heap - https://news.ycombinator.com/item?id=43485980 - March 2025 (93 comments)

You might want to stop running atop - https://news.ycombinator.com/item?id=43477057 - March 2025 (139 comments)

cullenking

I was bit by atop a few years back and swore it off. I would get perfectly periodic 10m hangs on MySQL. Apparently they changed the default runtime options such that it used an expensive metric gathering technique with a 10m cron job that would hang any large memory process on the system. It was one of those “no freaking way” revelations after 3 days troubleshooting everything.

Interesting reading through the related submission comments and seeing other hard to troubleshoot bugs. I don’t think atop devs are to blame, my guess is that what you have to do to make a tool like atop work means you are hooking into lots of places that have potential to have unintended consequences.

unsnap_biceps

It's unfortunate that Unix sockets isn't being used for local connections like this.

charcircuit

It's more unfortunate a proper RPC library is not being used. People rolling their own buggy parsers in C is an endless source of bugs.

ahoka

The whole code is horrible: https://github.com/Atoptool/atop/commit/542b7f7ac52926ca2721...

Inconsistent usage of braces, no clear memory ownership or life-cycles, zero tests.

the-lazy-guy

Can you please provide an example of good C code?

I agree that absence of tests isn't great, and is very common with many C-based projects. But the rest of your comments reads like "ooh, it's C, disgusting!". I hope, I'm wrong.

timcobb

> People rolling their own buggy parsers in C

I'd like to believe this isn't common anymore for new projects?

worthless-trash

I dont want to ruin your weekend.

ajross

Meh. This isn't a technology choice problem. Routine unix sockets are just some file in /tmp which an attacker could likewise open by racing against the daemon in the same way.

It's true you could use a privileged spot in the filesystem and set things up to use that by writing some simple extra software, but it's equally true that you could lock down a TCP socket to a specific process with about the same amount of work.

Bottom line is that you need to validate your input from outside the process if you're running in a privileged context[1], and atop didn't.

[1] It's not mentioned in the linked email, but I assume the core problem here (and the reason it got a CVE number) is that the atop binary is setuid?

adrianmonk

> Routine unix sockets are just some file in /tmp which an attacker could likewise open by racing against the daemon in the same way.

So put the socket in /run instead of /tmp?

I'm no expert, but this appears to be where they belong, and it appears to solve the problem. From https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch03s15.htm... : "System programs that maintain transient UNIX-domain sockets must place them in this directory or an appropriate subdirectory as outlined above." ... "/run should not be writable for unprivileged users; it is a major security problem if any user can write in this directory."

ajross

Putting them in /run if you're not already root requires a little extra software be written though. Locking down a TCP socket isn't much harder. I'm not saying "don't use Unix domain sockets", I'm saying that treating this bug as the result of technology choice is bad security analysis.

fpoling

These days Unix sockets for system daemons should be placed under /run with permissions that only a particular daemon can access for binding. With systemd service and socket units it is trivial to do.

3np

> but it's equally true that you could lock down a TCP socket to a specific process with about the same amount of work.

How, actually? With UNIX sockets it can be a matter of setting file ownership and mode (at worst, a chmod and a chown).

What's the equally simple way to restrict access to a locally listening tcp socket?

johnmaguire

> but it's equally true that you could lock down a TCP socket to a specific process with about the same amount of work.

Can you educate me? I'm familiar with SO_PEERCRED that returns the user/group/pid on the other end. Would you then checksum the exe of the pid from /proc?

theamk

SO_PEERCRED is only for Unix domains though, it's not going to work for TCP.

For TCP, your only easy option is to have port <1024 - but that requires root. If you want a dedicated user, then TCP requires hacks - like creating a cookie file in some protected location, like XAuthority does.

But if you have a protected location, why even bother with all this? Just create a UNIX socket there directly, after all the difference is only in connect call, read/write loop is the same. And as an extra bonus there is much better visibility, and zero chance of someone accidentally grabbing your magic number.

Unix sockets are really underappreciated.

ajross

You can check socket credentials, indeed. You can set up filtering rules to match on UID using nftables. You can do things like put a cookie somewhere else to exchange and authenticate the connection a-la xauth. You could use TLS and check the host key vs. a public key stored at install time. There are many ways to do this, none of which require more than a few dozen lines of code/config.

But really the simplest thing would just be to use a port <1024 so that only root can open it. That's literally what the feature was for. You can still be "attacked", but only by someone who already has local root.

eptcyka

It is. But even with unix sockets, the client should never blindly trust the bytes received and parse them defensively.

lelanthran

> Bottom line is that you need to validate your input from outside the process if you're running in a privileged context[1]

What this "if" qualifier? You need to validate all input from outside the process. Whether the process is privileged or not is, frankly, not really relevant.

(I submitted a blog post a few days ago explaining "Parse, Don't Validate" in plain C, but it didn't get any traction).

ajross

> What this "if" qualifier? You need to validate all input from outside the process.

Not all tools are designed to accept input from outside a security boundary. Obviously atop isn't one, but the world is filled with software that misbehaves on bad input. Ever DDoS your build system by misconfiguring something? Crash a running program by removing a cache directory (or unpacking a tarball on top of it)?

It's very rarely a bad idea to fail to validate input. But it's for sure not always a requirement either.

And to be blunt, it's not really possible either. You write "insecure" parsers/interpreters/whatever probably every day, we all do. And you "know" when it's safe and when it's not, I'm sure. But my point is that if that knowledge isn't based on at least a little bit of rigor ("crossing a privilege boundary" in this case), you're probably going to do it wrong.

Galanwe

> The vulnerability is caused by the fact that atop always tries to connect to the TCP port of 'atopgpud' during initialization. When another local program has been started (instead of 'atopgpud') that listens to this TCP port, atop connects to that program. Such program is able then to send unexpected strings that may lead to parsing failures in atop. These failures result in heap problems and segmentation faults.

Okay, so, if I have a shell and the rights to listen on a host, I can crash the "atop" of other users? That's it ? I could also create a fork bomb, fill up the disk, use all CPU and memory, etc...

TonyTrapp

Not the same thing at all if atop runs as root and you are a user on that system that has no root access. With a well-prepared exploit you could achieve code execution as root. That's a bit more than a simple Denial of Service by filling up the disk.

bitbasher

I think the concern is for privilege escalation.

yjftsjthsd-h

Ah, there's the other shoe:)

> optional sources, that have to be activated explicitly.

So only locally exploitable, and you have to enable an optional feature? That's ... honestly better than I was worried that it might be

dgacmu

No. Local but it always tries to connect and the deamon to which it tries to connect is optional, which means that the default is attackable. An attacker can run their own program on the port and send bad strings that will cause an overflow.

yjftsjthsd-h

Oh, I see, thanks.

> Therefore, the default behavior of atop is now not to connect to the TCP port at all.

I missed that now it defaults to not connecting.

MattPalmer1086

The fix is to make it optional.

But yeah, I was anticipating something quite a bit worse.

immibis

> always tries to connect

xyst

Right, the post on “rachelbythebay” was hinting at something much worse.

brazzy

How so? It was pretty clear from her second post that it's a local privilege escalation. And that is is, and otherwise fairly easily exploitable.

natebc

well, the first post opened with "You might want to stop running atop" and followed with "Right now, I think it's probably best if you uninstall atop. I don't mean just stopping it, but actually keep it from being executed."

Which does indeed hint at something much worse IMO.

To be clear: I value rachaels opinion and contributions greatly. Maybe just these days I'm a little grounchy about panicky security people making us spend hours during the middle of the week uninstalling atop from hundreds of systems that wouldn't have been at risk from something like this.

mvdtnz

Did you stop reading at that sentence?

yjftsjthsd-h

Unlikely, since the use of a local TCP part was later than the quoted sentence. Granted, I did skim, but after having it clarified and rereading, I think that introduction is misleadingly phrased and would benefit from clearer delineation of the previous vulnerable behavior and the fixed behavior.

mvdtnz

So what was the point of Rachel's vagueposting? Was there any kind of NDA or a good reason to be so vague?

brazzy

Responsible disclosure?

stiild

I have a semi-related question.For someone whose main job is not maintaining or running full linux servers but would like information about processes and their RAM/CPU..etc. What would be a good tool that is easy to parse with good defaults?

edoceo

The tool btop was suggested in the other thread to replace atop and htop.

0manrho

Seconding btop++, been running it as my main top for a few years now, and switched from htop. I didn't have a single complaint about htop, did what it said on the tin and did it well in my experience, but personally I prefer btop's ux/ui.

worthless-trash

If you are writing software to parse it, dont use third party tooling. Read the kernel outputs directly (/proc/ /sys etc).

While they do have no guarantee not to change, if they do change any tool you are parsing will also be broken.

ezekiel68

I recommend.. atop, now that it has been updated to address this issue.

candiddevmike

Node exporter is a good start, or you could look at Netdata

calvinmorrison

htop is a decent curses processes manager that's a few miles better than top

Zardoz84

I recommend nmon

Havoc

That sounds less bad than expected

zitterbewegung

Is it just me or does this seem like a bad design where a TCP port is exposed to share information?

kevincox

Yes. Any local process can connect to a TCP port (unless special care is taken) so it should be a last-resort option. Additionally the sever either needs to be run as root to bind a privileged port or any application can race over binding that port. UNIX sockets are a much better option as they can be protected by filesystem permissions including who can bind the socket and who can connect to it.

This can be mitigated by having authentication inside the socket, but now your authentication code is an attack surface and how are you going to share the secrets? On the filesystem? You are basically back to a UNIX socket with extra steps.

marginalia_nu

As long as you bind to localhost it's fine in theory. Though any network code still needs to be rigorously hardened.

echoangle

> As long as you bind to localhost it's fine in theory

But only if you assume that the data being transferred is public, right?

With the described method, any non-privilieged user could access the data from the TCP socket, right?

marginalia_nu

Information in top isn't much of a secret though.

amiga386

So, as https://www.cve.org/CVERecord?id=CVE-2025-31160 says:

* CWE-617 Reachable Assertion

* affected from 0 through 2.11.0

... can we assume these will be updated to the actual vulnerability (CWE-940, CWE-120?), and vulnerable versions (2.4.0 through 2.11.0)? Or was the vaguepost about an entirely different vulnerability? Does anyone yet know what specific issue the vaguepost was alluding to?