Skip to content(if available)orjump to list(if available)

A deep dive into Linux's new mseal syscall

ykonstant

Interesting. The article mentions "spicy discussions" in the kernel mailing list. Is there any insider who can summarize objections and concerns? I tend to avoid reading the mailing list itself since it can get too spicy, and my headaches are already strong enough!

The mechanism itself seems reasonable, but I am surprised that something like this doesn't already exist in the kernel.

ziddoap

Not sure if there was much more to it than the thread linked to, but it was basically Linus being Linus. He said stuff that made sense in a pretty blunt fashion.

There were flags proposed that allowed the seal to be ignored.

>So you say "we can't munmap in this *one* place, but all others ignore the sealing".

Later was the spice.

>And dammit, once something is sealed, it is SEALED. None of this crazy "one place honors the sealing, random other places do not".

And later, even spicier, Linus says that seals cannot be ignored and that is non-negotiable. Any further suggestions to ignore a seal via a flag would result in the person being added to Linus' ignore list. (He, of course, said this with some profanities and capitals sprinkled in.)

js2

Wasn't just Linus. Earlier, from Theo de Raadt:

> I don't think you understand the problem space well enough to come up with your own solution for it. I spent a year on this, and ship a complete system using it. You are asking such simplistic questions above it shocks me.

https://lwn.net/ml/linux-kernel/95482.1697587015@cvs.openbsd...

Via https://lwn.net/Articles/948129/

null

[deleted]

0xbadcafebee

Not a great perspective... "It took me a year [or more] to understand this. The fact that you don't understand it shocks me." Dude, not everybody's as smart or experienced as you. Here's an opportunity to be a mentor.

greenavocado

https://lwn.net/ml/linux-kernel/7071.1697661373@cvs.openbsd....

    From:   Theo de Raadt <deraadt-AT-openbsd.org>
    To:   Jeff Xu <jeffxu-AT-google.com>

    > On Wed, Oct 18, 2023 at 8:17 AM Matthew Wilcox <willy@infradead.org> wrote:
    > >
    > > Let's start with the purpose.  The point of mimmutable/mseal/whatever is
    > > to fix the mapping of an address range to its underlying object, be it
    > > a particular file mapping or anonymous memory.  After the call succeeds,
    > > it must not be possible to make any address in that virtual range point
    > > into any other object.
    > >
    > > The secondary purpose is to lock down permissions on that range.
    > > Possibly to fix them where they are, possibly to allow RW->RO transitions.
    > >
    > > With those purposes in mind, you should be able to deduce for any syscall
    > > or any madvise(), ... whether it should be allowed.
    > >
    > I got it.
    > 
    > IMO: The approaches mimmutable() and mseal() took are different, but
    > we all want to seal the memory from attackers and make the linux
    > application safer.

    I think you are building mseal for chrome, and chrome alone.

    I do not think this will work out for the rest of the application space
    because

    1) it is too complicated
    2) experience with mimmutable() says that applications don't do any of it
    themselves, it is all in execve(), libc initialization, and ld.so.
    You don't strike me as an execve, libc, or ld.so developer.

greenavocado

    From:   Matthew Wilcox <willy-AT-infradead.org>
    To:   Jeff Xu <jeffxu-AT-google.com>

    ...

    Yes, thank you for demonstrating that you have no idea what you need to
    block.

    > It is practical to keep syscall extentable, when the business logic is the same.

    I concur with Theo & Linus.  You don't know what you're doing.  I think
    the underlying idea of mimmutable() is good, but how you've split it up
    and how you've implemented it is terrible.

    ...

lathiat

ykonstant

Very nice, thanks!

Edit: I always find it funny that these articles on the mailing list tend to read like a sports announcer describing a boxing match!

westurner

- "Memory Sealing "Mseal" System Call Merged for Linux 6.10" (2024) https://news.ycombinator.com/item?id=40474510#40474551 :

> How should CPython support the mseal() syscall?

metadat

Will it be possible to override / disable the `mseal' syscall with the LD_PRELOAD trick?

eska

mseal digresses from prior memory protection schemes on Linux because it is a syscall tailored specifically for exploit mitigation against remote attackers seeking code execution rather than potentially local ones looking to exfiltrate sensitive secrets in-memory.

If a remote attacker can change the local environment then they must have already broken into your system.

null

[deleted]

Dwedit

Probably not LD_PRELOAD. It would need to be an imported function in order for LD_PRELOAD to have any effect. A raw syscall would not be interceptable that way.

Discussion about intercepting linux syscalls: https://stackoverflow.com/questions/69859/how-could-i-interc...

But building your own patched kernel that pretends that mseal works would be the simplest way to "disable" that feature. Programs that use mseal could still do sanity checks to see if mseal actually works or not. Then a compromised kernel would need secret ways to disable mseal after it has been applied, to stop the apps from checking for a non-functional mseal.

jandrese

I'm not sure what protection you could expect on any system where the kernel has been replaced by the attacker. Sure they can bypass mseal, but they are also bypassing all other security on the box.

Dwedit

Two different considerations for when you'd want to deny memory to other processes:

Protecting against outside attackers

Digital Rights Management

Faking "mseal" is something you might intentionally do if you are trying to break DRM, and something you would not want to do if you are trying to defend against outside attackers.

chucky_z

You can override the mseal call wrapper but not the syscall itself.

This is an interesting thought so I looked it up and this is how (all?) preload syscall overrides work. You override the wrapper but not the syscalls itself so if you’re doing direct syscalls I don’t think that can be overridden. Technically you could override the syscall function itself maybe?

the8472

https://lwn.net/Articles/978010/ says there'll be a glibc tunable

cataphract

Depends whether the program calls into libc or inlines the syscalls, I imagine. Though you could use other mechanisms like secccomp.

throw0101a

mseal() and what comes after, October 20, 2023: https://lwn.net/Articles/948129/

mseal() gets closer, January 19, 2024: https://lwn.net/Articles/958438/

Memory sealing for the GNU C Library, June 12, 2024: https://lwn.net/Articles/978010/

unwind

Meta: the mseal() prototype in the article needs some editing, it is not syntacticallly correct as shown now. The first argument is shown as

    unsigned start addr
But should probably be

    unsigned long start_addr

hifromwork

Seems to be OK now:

    int mseal(unsigned long start, size_t len, unsigned long flags)