Problems with the heap

103 comments

·March 26, 2025

NobodyNada

Both of the error messages we're given indicate that the "top" chunk of the heap was corrupted, which is a special internal allocation used by glibc malloc to represent any unused capacity from the last time malloc decided to grow the heap:

    high memory addresses ("top" of heap)
    | unmapped memory |
    |-----------------|
    |                 |
    |    top chunk    |
    |  (mapped, but   |
    |     not yet     |
    |    allocated)   |
    |                 |
    |-----------------|
    | allocated chunk |
    |-----------------|
    |   freed chunk   |
    |-----------------|
    | allocated chunk |
    |                 |
    |-----------------|
    | allocated chunk |
    |-----------------|
    |   freed chunk   |
    |-----------------|
    |     tcache      |
    |-----------------|
    low memory addresses ("bottom") of heap

That likely indicates a heap buffer overflow. If a call to malloc() doesn't find a freed chunk, it will split the top chunk in two and return a pointer to the bottom portion. If you then write past the end of the returned allocation, you clobber the metadata of the top chunk, and get errors like the ones in the article.

With the exploit mitigations built into modern Linux and glibc, it's a lot of work to go from here to arbitrary code execution; but it very well may be possible, depending on exactly how much control the attacker has over what atop does. The attacker can probably trigger the heap buffer overflow multiple times by spawning multiple processes, and if the length and contents of the heap buffer overwrite are attacker-controlled, they can probably play some games to overwrite any data stored in the heap. If that's true, the only thing preventing full arbitrary-code-execution is ASLR; there are many clever ways to get around that, but it's often quite difficult and may or may not be possible here.

This year's LACTF had a challenge with essentially this exact setup. My solution writeup is a good example of what it takes to defeat the exploit mitigations and turn a heap buffer overflow into an RCE: https://jonathankeller.net/ctf/lamp/

progmetaldev

I thought I would ask, does this mean that memory restricted environments can be a better target for attack than one with a large amount of memory available? In my mind it seems like this would be the case, but I'm not sure if there is anything in place to protect these types of environments, like intentionally breaking up and segmenting memory so it's not possible to read much linearly. I admit that I haven't touched low-level code since the early 2000's, and earlier than that for anything other than a course requirement, so I apologize if you explained it in your linked article and I don't understand.

NobodyNada

Yes, with the caveat that virtual memory restrictions matter a lot more than physical memory restrictions.

Heap exploitation is difficult because 1) glibc malloc is hardened to try to defeat many common exploit strategies, and 2) ASLR means that even if you have the ability to corrupt a pointer, you might not be able to control what the pointer points to.

Regarding #1, memory constraints don't really make a difference until you're so constrained on memory that you can't run glibc anymore, so you use an allocator that is optimized for low overhead/code size rather than performance and security.

For #2: ASLR works by placing every section of the process (heap, stack, program binary, and each library) at a randomized virtual address so that attackers can't forge pointers. ASLR is much more effective on a 64-bit system than on a 32-bit system, simply because there's so much virtual address space available to choose. If memory addresses are only 32 bits, it's feasible to just brute force guess the memory address of some data you're interested in; with 64 bits, it's not. And if your system doesn't have virtual memory at all (like a microcontroller), you probably don't have any kind of ASLR.

> intentionally breaking up and segmenting memory so it's not possible to read much linearly

This is typically done at the level of individual sections (stack, heap, program binary, library binaries) but not within sections; mainly for performance, memory overhead, and cache locality reasons. The entire heap is contiguous, so if you overwrite past the end of one heap allocation you can overwrite adjacent allocations, but you can't overwrite the stack (without more work). Breaking up the heap into smaller chunks wouldn't really help that much; it just means an attacker has to manipulate the heap layout so they can be sure the allocation they're targeting ends up in the chunk they're targeting.

Exploit hardening in the heap allocator is something of a last-ditch stopgap measure: "we've already lost, but let's see if we can minimize the damage to the user/maximize the difficulty to the attacker". There's certainly much more you could do to harden the heap, but remember that any security measures implemented in glibc are applied to every program, with no way to opt out (except bringing your own libc). So glibc is designed to maximize performance and compatibility; exploit mitigations are only included when they doesn't compromise these goals.

If you don't mind giving up some performance for the sake of security, you probably would already be using a garbage-collected language instead of C. (And now that Rust is sufficiently mature for most use cases, you probably should be using that if you're in a situation where you need both performance and security).

progmetaldev

Thank you for your reply, that makes quite a bit more sense to me!

LPisGood

It’s a lot easier to use an exploit if you know your chunk is from fast bins if I recall correctly.

ianbutler

Hey guys so I work on a tool call Bismuth along with my co-founder for finding and fixing bugs and we think we have this. At the very least we have a bug in atop which mimics what is being described.

We're going to throw this sha down right here: 1d53b12f3bc325dcfaff51a89011f01bffca951db9df363e6b5d6233f23248a5

And now we're going to go responsibly disclose what we have to the maintainers.

ianbutler

We did in fact find the bug:

https://news.ycombinator.com/item?id=43519522

ianbutler

We've reached out to the maintainer over e-mail.

Retr0id

Based on the bug you've found, do you think it's exploitable beyond DoS?

__turbobrew__

Thank you for doing this instead of just vagueposting and wasting everyone’s time.

vanderZwan

I'm inclined to expect that we should put the blame for that on whomever used legal channels to force Rachel to shut up, although obviously the jury is still out until we know more.

supriyo-biswas

This is the only explanation that makes some sense, otherwise it would be just a dick move for someone to hint at the presence of an exploitable bug but then not say what exactly it is.

spyc

I'm reading "I can go into why another time." like "I don't have time" personally, not like "I am not allowed to say".

stavros

It might have just been responsible disclosure.

spyc

Hi! Three things:

- There is no commit with a SHA1 like that in atop Git history and what you shared is too long for a SHA1, it looks more like a SHA256. Did you share the right checksum? The only other way I can read this is that it's a SHA256 checksum of one of the past atop release tarballs or artifacts. I have not yet checked those.

- I have tried finding your tool Bismuth but all I find is things KDE and crypto currencies. Please share a link to the Bismuth that you are working on.

- You technically said that you are working on Bismuth /and/ found something, not that you found the bug /through/ Bismuth. Please clarify if and how that was the case.

Thank you!

ianbutler

- That SHA is just a proof marker so if it turns out we are correct we can prove we had it at that time

- Bismuth did indeed find the bug, our bug scanning feature in particular. Obviously we're going to sit on our hands until the maintainer gives the all clear but we'll write something up after this is all squared away

- https://www.bismuth.sh is our tool, we're still relatively new

throwawayben

pretty sure it's just a hash of some text they can reveal later, to prove that they had something at this point in time. not referring to any release or commit

ianbutler

This is exactly correct

spyc

I see, thanks!

spyc

Update: I found https://www.bismuth.sh/ at https://news.ycombinator.com/user?id=ianbutler .

dang

Recent and related:

You might want to stop running atop - https://news.ycombinator.com/item?id=43477057 - March 2025 (131 comments)

EdiX

I gave her the benefit of the doubt initially, she usually posts good posts, but this is not the way to do things. Vagueposting about a security vulnerability without properly disclosing it to the mantainers: (1) damages their reputation, (2) sends every blackhat on the hunt like a real life worldwide CTF event, (3) leaves sysadmins in the dark unless they are following this specific random blog and (4) since the details aren't known even if they know it's impossible to determine if they really are affected.

Something like this would be justified if the maintainers were unresponsive and it was a remotely exploitable bug. Now it turns out this is probably a minor thing (local privilege escalation if you happen to be running atop as a privileged user).

It seems to me like an irresponsible, egocentric way to handle things.

ynik

At least on Debian, installing the `atop` package will automatically install a background service running atop as root. (by default, logging some stats to /var/log/atop/ every ten minutes)

ptx

> a minor thing (local privilege escalation if you happen to be running atop as a privileged user)

I seem to be hearing this sentiment a lot lately. How is local privilege escalation a minor thing?

If it's such a minor thing, is the old advice to not run as root considered passé? Should we just run everything as root? Should we discard the entire Unix security model and chmod all files to 0777?

ayende

In most scenarios, you are no longer running with multiple users on the same machine. Either this is a server, which has an admin team, or a client machine, which _usually_ have a single user.

That isn't 100% true, and local privilege escalation matters, but it is a far cry from remote code execution or remote privilege escalation.

heavyset_go

User privilege separation is a foundation that allows many container implementations to work, and for sandboxes software like Tor or, for however unlikely it is that you're running atop on it, Android use, etc.

If someone is running Tor to not end up in prison/dead, their Tor sandbox can be opened for anyone to own, for example.

eptcyka

Root privileges allow for a much wider attack surface for escaping out of a VM. Not using root everywhere still helps with defense in depth.

red1reaper

> Should we discard the entire Unix security model and chmod all files to 0777

It depends, but for most use cases... yes, actually.

heavyset_go

All of that etiquette sounds nice if you're being paid to do this work, but I don't think anyone is obligated to "properly" disclose a vulnerability they found on their own time/dime, nor do I think it's a moral imperative for anyone to do so.

nukem222

Eh, finger pointing does nobody any good, emphatically including this comment. Finger pointing towards someone who actually found a vulnerability is just bleak. I would not willingly associate with anyone who engaged in such behavior.

Maintaining software is hard, but this does not imply a right to be babied. People should simply lower their expectations of security to match reality. Vulnerabilities happen and only extremely rarely do they indicate personal flaws that should be held against the person who introduced it. But it's your job to fix them. Stop complaining.

gruez

>Finger pointing towards someone who actually found a vulnerability is just bleak. I would not willingly associate with anyone who engaged in such behavior.

Nobody is "finger pointing" Rachel for the vulnerability. They're calling her out for how she communicated it. I feel that's totally justified. For instance if someone found a critical RCE, but the report was a barely coherent stream of consciousness, it's totally fine to call the latter part out. That's not "finger pointing".

>But it's your job to fix them. Stop complaining.

It's the developers job to respond to bug reports in the form of vaguely written blog posts?

cenamus

Yeah shame on the people irresponsibely publishing the vulnerability, but the people putting them in? Who cares

nukem222

[flagged]

amiga386

Fingerpointing is bad, but we have to have an honest conversation.

One person posted the vague post. They clearly did not expect the reaction it got, though they could have anticipated some of it, they are aware their blog is widely read. Their reaction is commendable, to quickly post a followup appealing for calm and sharing some details, to quell the problems caused by the intense vagueness.

What people from HN did, because of the vagueness, was assume this a super-secret-squirrel mega-vulnerability and Rachel is gagged by NDAs or the CIA or whatever... and they've gone off and harrassed the developers of atop while trying to find the issue.

Imagine a person of note saying "the people at 29 Acacia Road are suspicious", then a mob breaks down the door and start rifling through all the stuff there, muttering to themselves "hmm, this lamp looks suspicious... this fork looks suspicious"... absolute clowns, all of them.

For example, this asshole who went straight in there with bad-faith assumptions on the first thing they saw: https://github.com/Atoptool/atop/issues/330#issuecomment-275...

No, you dummies, it's not going to be in the latest commit, or easily greppable.

This is exactly why CVEs, coordinated disclosure, and general security reporting practises exist. So every single issue doesn't result in mindless panic and speculation.

There's now even a CVE purely based on the vaguepost, assigned to a reporter who clearly knows fuck all about what the problem is: https://www.cve.org/CVERecord?id=CVE-2025-31160 - versions "0" through "2.11.0" vulnerable, eh? That would be all versions, and the reason the reporter chose that is because they don't know which versions are vulnerable, and they don't know what it's vulnerable to either. But somehow, "don't know", the absence of information, has become a concrete "versions 0 to 2.11.0 inclusive"... just spreading the panic.

I don't know why Rachel is vagueposting, but I can only hope she has reported this correctly, which is to:

1. Contact the security of the distro you're using. e.g. if you're using atop on debian, then email security@debian.org with the details.

2. Allow them to help coordinate a response with the packager, the upstream maintainer(s) if appropriate, and other distros, if appropriate. They have done this hundreds of times before. If it's critically important, it can be fixed and published within days, and your worries about people being vulnerable because you know something they don't can be relieved, all the more quickly.

freeopinion

I commend you for writing what you think should be done and not just complaining about what was done. It is more helpful to express the correct procedure than to only label things as the wrong procedure.

whatnow37373

I never quite understood why computing is so different from literally all other branches of reality. Systems need to be secure, I get it. But if we have a bunch of folks dedicating their life to breaking your shit I don't get how that is in any way acceptable and why the weight of responsibility solely lies with people responsible for security.

We apparently have a society/world that normalizes breaking everyone's shit. That's not normal - IMO.

If I break into a factory or laboratory of some kind and just walk out again I have not found a "vulnerability" and I certainly won't be remunerated or awarded status or prestige in any way shape or form. I will be prosecuted. Everyone can break into stuff. It's not that stuff is unbreakable, it's that you just don't do that because the consequences are enormous (besides obvious issues with morality). Again, breaking stuff is the easy part.

I am certainly completely ignorant and should be drawn and quartered for it, but for me it is hard to put my finger where I'm so wrong.

I can see how the immaterial nature of software systems changes the nature of the defense, but I don't see how it immediately follows that breaking stuff that's not allowed to be broken by you is suddenly the norm and nothing can be done against that. We just have to shrug and accept our fate?

hyperpape

Leaving aside the ethics of vulnerability research in server-side software, you're neglecting the fact that atop runs on your own machine.

So it's not like breaking into a factory. It's like noticing that your dishwasher makes the deadbolts in your house stop working (yes...a weird analogy--there are ways software isn't like physical appliances).

Surely you have the right to explore the behavior of your own house's appliances and locks, and the manufacturer does not have the right to complain.

As for server side software, I think the argument is a simple consequentialist one. The system where vulnerability researchers find vulnerabilities and report them quietly (perhaps for a bounty, perhaps not) works better than the one where we leave it up to organized crime to find and exploit those issues. It generates more secure systems, and less harm to businesses and users.

dcminter

I find your view bizarre.

If I buy a physical product, take it home, and then publish the various issues I find with it then ... nobody has a problem with that

I'm as sad as the next guy that the safe and trusting internet of academia is long gone, but the generally accepted view nowadays is that it's absolutely full to the gills with opportunistic criminals. Letting people know that their software is insecure so they don't get absolutely screwed by that ravening horde is a public service and should be appreciated as such.

Pen testing third party systems is a grey area. Pen testing publicly available software in your own environment and warning of the issues is not, particularly when the disclosure is done with care.

dagss

Well also in the real world, if you look at history, people DID exploit the neighbouring tribe with impunity if they could not defend themselves ("what idiots don't have a guard during night"), or built stone fortresses with 3 metre stone walls.

When living under those conditions, people probably did put the responsibility to be safe on the victim..

We have been able to remove this waste due to the introduction of the national state, laws, "monopoly on violence", police...

It is THOSE things that allows the factory in your analogy to not spend resources on a 3 metre stone wall and armed guards 24/7.

Now on the internet the police, at least relatively to the physical world, almost completely lack the ability to either investigate or enforce anything. They may use what tools they can, but it does not give them much in the digital world compared to the physical.

If we want internet to be like the real world in this respect, we would have to develop ways to let the police see a lot more and enforce a lot more. Like they can in the physical world.

cturner

"If I break into a factory or laboratory of some kind and just walk out" This is a weak analogy. In the situation you describe, right-and-wrong is easily understood by the layman, there is a common legal framework, there is muscle to enforce the legal framework.

In the computing space - if someone breaks the rules, it is only a bunch of us that understand what rule was broken, and even then we are likely to argue over the details of it. The people doing the breaks are often anonymous. There is no shared legal framework, or enforcement, or courts. The consequences of a break are usually weak. Consider the lack of jail time for anyone involved with Superfish. Many of these people were located in the developed world.

The computing world often resembles the lawlessness of earlier eras - where only locally-run fortifications separated civilian farmers from barbarian horsemen. A breach in this wall leads to catastrophe. It needs to be unbreakable. People who maintain fortifications shoulder a heavy responsibility.

immibis

We can lock down the Internet so hard that every IP packet is associated with a physical address, then go and arrest people who allow bad packets to be sent from their address. This is what many governments are persistently trying to do. Is it a good idea?

frontfor

I second this. The pompous holier-than-thou I-know-better attitude some members of the computer security community has always rubbed me the wrong way. This behaviour of complaining is a manifestation of the typical “putting down” and dismissing someone who isn’t part of the tribe.

john-h-k

as there seems to be some confusion, this is my interpretation:

atop is (for some reason) touching memory of processes it monitors.

atop is touching this in an insecure way. An executable can cause atop to corrupt its memory.

this has high potential (although not guaranteed) for allowing RCE within atop via a correctly crafted process that atop monitors.

atop is often run in root, and so this otherwise meaningless RCE becomes privilege escalation, which is bad

either this is correct, or cunningham’s law will bring out the correct interpretation

shanemhansen

This is a complete shot in the dark but wild speculation is fun. If atop had a buffer overflow when reading a process name (changeable at runtime using $0 in perl for example) this would be the kind of issue I expect.

Similarly, some other value that was expected to be null terminated but wasn't.

twic

My guess is that it tries to decode the malloc metadata, which involves chasing pointers, and doesn't do enough sanity checking, so if a process accidentally or deliberately sets up corrupt metadata, atop will dereference an invalid pointer and explode itself.

baltimore

Same author in March 2014 was having segfault issues with atop apparently: https://rachelbythebay.com/w/2014/03/02/sync/

pdpi

Rachel's been a reliable source of interesting issues like these for the better part of eternity now. Her blog's well worth reading.

oguz-ismail

[flagged]

Niten

[flagged]

usefulcat

After reading that: atop should've used SQLite.

trismegisti

It would be very surprised if this isn't just an atop bug.

Can it be exploited? Considering the error messages, the possibility is high.

I don't like to see it as "Problem with the heap". As someone who played lots of CTF and is quite sufficient in exploiting such bugs, i would much rather see those things framed as "Problems with the glibc allocator".

If we just wouldn't use inline metadata, or verify it's integrity (smth like scudo but not broken), all of the security issues with the heap would just be gone. If you get to work with frameworks/languages that are more flexible when it comes to allocate memory (thinking about zig and it's amazing allocator abstraction here) you quickly realize that malloc and free are an insanely simplistic api for what is one of the most difficult and important problems in programming: memory allocation.

There is also no excuse for the error messages being that bad. Because the reality is that most systems programmers will have to debug those at some point.

bee_rider

RachelByTheBay underestimated how much we hang on her every word over here, I think. Haha.

anitil

I believe she hasn't had the best behaviour from our members unfortunately

simoncion

I expect she's had the best behavior that the members in question are capable of.

neuroticnews25

Can you expand on this? I'm new here and I have been binge reading her posts recently.

anitil

I'm going from memory but I think when posts of hers get traction here she'll get comments that I think she refers to as 'The One' [0], and also comments that... you typically wouldn't get for a male poster.

[0] https://rachelbythebay.com/w/2018/04/28/meta/

wglb

I think it has gotten better of late.

mary-ext

A vulnerability can be high risk, a vague disclosure means that people can only assume the worst.

Maybe don't vaguepost about vuln disclosures

ultrarunner

This reads like it's referencing something specific. Am I out of the loop or is this just about heap exploits in general?

bee_rider

There was a post yesterday

https://news.ycombinator.com/item?id=43477057

Which was sort of strongly reacted to (atop is wide spread and the blog author has a bit of a following here).

dan353hehe

A post from them yesterday made it to the front of HN.

https://news.ycombinator.com/item?id=43477057

Edited to be the HN post.

nsxwolf

I found this hard to follow, like it was in the middle of something.

titaphraz

Responsible disclosure has been a thing for a long time. This is not a professional behavior.