Skip to content(if available)orjump to list(if available)

The SeL4 Microkernel: An Introduction [pdf]

mmooss

SeL4 is old news - not a criticism, but has anyone added another formally proven layer or component? (Edit: I mean new components beyond the microkernel, not improvements to the microkernel.)

Also, I suspect some people - maybe some on HN :) - get emotional overload when they see the word 'proof' and their intellectual functions stop. It's not a panacea for the infinite problem of secure IT; it isn't a way to create or discover a perfect and flawless diamond of software. IIUC it means it's proven to meet specific requirements in specific conditions, and those requirements and conditions can be quite narrow; and it says nothing about other functions and conditions that are out of spec. Is that roughly correct?

What does it mean in practical terms? What does a security professional see when they see 'formally proven software'?

What are the specs that SeL4 meet (no, I haven't looked at the OP in a long time)? Isn't that the essential information here?

ryao

Being "formally proven" to be free of various flaws did not make seL4 immune to memory corruption flaws. Despite the formal proof, a memory corruption flaw was found a few years ago. The PRs for the commits fixing it and amending seL4's proof are public:

https://github.com/seL4/seL4/pull/243

https://github.com/seL4/l4v/pull/453

You will find a number of other memory related bugs in its issue tracker:

https://github.com/seL4/seL4/issues?q=is%3Aissue%20label%3Ab...

Interestingly, the PR fixing "register clobbering" in memory was not labelled bug, so it is not listed when you filter by "bug".

I used to think seL4 was immune to these issues given its proof allegedly saying it is, but upon seeing this, I have come to think that the proof is not as comprehensive as the wider community has been lead to believe. That said, seL4 is still a very impressive piece of software.

Finally, to answer your question. The specification that seL4 meets is published on github:

https://github.com/seL4/l4v

Veserv

I do not see how that is a kernel memory corruption. I do not understand the code in context, but it appears to just be a case where the kernel incorrectly clobbered a userspace register which could result in incorrect userspace code execution. A error in that class would not inherently violate the system safety properties of the kernel itself.

If so, that would only fall under a functional specification problem in the userspace API making it hard to use the userspace API in the intended/desired way. That would make it hard to use the kernel and verify your system using the kernel, but it would not constitute a kernel memory corruption which is what safety specifications and proofs of the kernel itself would be concerned with.

ryao

I said that a memory corruption flaw was found in seL4. This is a fact. If you read the patch, you will see that a message that was just created had been being overwritten (which is explained in a code comment). This occurs inside the seL4 kernel code, and the proof previously permitted this behavior (according to one of the developers’ comments on the PR). The proof had to be amended to disallow this (which is the second PR). You should see this in the links.

froh

that's not how that works.

writing into memory that doesn't belong to you is memory corruption. in this case: kernel (induced) memory corruption.

and it's really a bad thing:

functional safety technically boils down to the job of the kernel to provide "spatial and temporal freedom from interference", meaning all subsystems are isolated from each other. no "process" can write to another process, no process can starve another process. in this perspective, the kernel is another (special and privileged) process on the system.

a kernel that randomly writes to user space processes is inherently unsuitable for functional safety.

it's a huge no no. a failure in the goals and purpose of the kernel.

and finding this here specifically was heavily shaking up confidence in formal specifications and formal proofs of compliance to these specs.

senderista

The PRs make it clear that this was a bug in the spec, and the implementation behaved according to spec.

ryao

That resulted in a memory corruption flaw in the C code. If you build a bridge to a specification and the bridge collapses, does saying that the specification was wrong in any way lessen the reality that the bridge failed?

isubasinghe

I believe only certain versions and on certain architectures is seL4 verified for. There are no bugs found at the C source code level for these builds of seL4.

ryao

This issue appears to have affected all architectures. This issue was present in the specification against which seL4 was verified. You can say that there are no bugs by virtue of it following the specification, but if the specification was wrong and in this case it was, then were there really no bugs?

vacuity

Even if it's not exactly seL4, there's good value in taking inspiration for design elements. It would still be a lot more robust than commodity operating systems.

amw-zero

We should look at formal verification like everything else: in terms of statistical effectiveness. It's not really important whether or not there are _no_ bugs. What's important is how many bugs there are for each unit of "effort."

For example, if this were hypothetically the only bug that was ever found, then that would be a pretty damn good argument that formal verification is effective. Because the bug rate of other non-verified operating systems is much higher.

ptman

seL4 in Rust?

saithound

Has anyone added another formally proven layer or component? Yes, they're being added all the time. Recently added features:

- Support for a bunch of new architectures, including RISC-V. - Mixed criticality scheduling, which provides capability-based access to CPU time, mechanisms to limit the upper bound of execution of a thread, ensuring that high-criticality tasks have priority and access to necessary resources, and allowing "passive servers" which run on scheduling time donated by the caller. - Microkit, a new verified abstraction layer which made it much much easier to build actual systems using seL4. - Very recently, the Device Driver Framework, a device-driver template, control and data plane implementation, and tools for writing device drivers and providing device virtualisation for high-performance I/O on seL4.

Formal verification can guarantee that specific requirements hold in specific conditions. It's true that in general such requirements and conditions can be quite narrow. But seL4 itself has a whole lot of proofs about it, covering a wide range of properties that one would want in a kernel, and these guarantees hold under very weak assumptions. Even the correctness of the C compiler is not assumed, they have a separate tool which looks at the compiler output and proves that the compiled binary behaves correctly with respect to the required C semantics.

The requirements that seL4 meets include the following: the binary code of the seL4 kernel correctly implements the behaviour described in the abstract specification and nothing more. There are no buffer overflows, no memory leaks, no pointer errors or null pointer dereferences, no undefined behavior in the C code, no termination of the kernel apart from the explicit ways enumerated in the spec, etc. The specification and therefore the seL4 binary satisfy the security properties of integrity and confidentiality: the former means that there is absolutely no way for a process to change data that the process has no explicit permission to change, and the latter means that a process cannot, in any way, read data it has no explicit permission to read. It even shows that a process without permission cannot indirectly infer the data through certain side channels. Beyond security, the expected worst-case execution-time guarantees and scheduling properties are also met.

null

[deleted]

null

[deleted]

trissylegs

The developers of seL4 have been in funding hell for years. Most of their work was darpa research for remotely controlled drones. The US Military would very much like drones that can't be hacked.

Their current work is on LionsOS which is more towards greater adoptions: https://lionsos.org/

snvzz

>The developers of seL4 have been in funding hell for years.

The actual story re: happenings in CSIRO is detailed in Gernot's blog[0], which is also quite interesting in its more technical posts.

0. https://microkerneldude.org/

dloss

For example: No buffer overflows, null pointer exceptions, use-after-free, etc. On ARM and RISCV64 not even the C compiler has to be trusted, because functional correctness has been proven for the binary. And there are more proofs besides functional correctness. https://docs.sel4.systems/projects/sel4/frequently-asked-que...

null

[deleted]

im_down_w_otp

https://github.com/auxoncorp/ferros

Lots of type-level programming for tracking resources, hardware access, and capabilities at compile-time and trying to bring some of the underlying kernel guarantees up to the compiler because finding out about and debugging issues at runtime was just the absolute worst.

snvzz

The document was updated in January.

A lot has happened in the last few years, such as the founding of seL4 foundation, the maturation of mixed criticality scheduling, the celebration of multiple seL4 summits, and the deployment in practical use by multiple companies and countries, in several scenarios.

axman6

Notably, Apple joined the seL4 foundation, as they use it in several of their products: https://sel4.systems/Foundation/Membership/ (Not sure if they've stated publicly which, but it's been pretty well known for a while now).

nickpsecurity

In high-assurance, formal methods were one of many used for the system. It was reduced complexity, precisely-specified behavior, likewise for security policy, proofs the design/code fulfilled it, exhaustive testing, leak analysis, pestering, and ability to recheck all this.

That's so hard it was mostly done for only kernels commercially. From there, your real-world use will involve user-level components, protocols with interactions, and so on. Add one thing to the proven TCB and maybe your proof disappears. So, every trusted component has to be rigorously designed. Then, all the interactions for a holistic argument.

Most don't do that. For example, seL4 hosting a web browser won't help you if it's a web application. IBOS and a WAF on seL4, all proven, might cover that. Harder argument. Most of these projects never go beyond the kernel or TLS stack or whatever.

You can do stuff with that. Look up Nizza Secure Systems Architecture for a paper with many examples. Yet, they usually don't cut it. They're not compatible with popular apps either. So, most people just harden Linux or BSD or Windows.

mmooss

> IBOS and a WAF on seL4, all proven

If you mean web application firewall, that seems too complex to be proven? With infinite possible inputs, inspecting it would seem to require too much code to prove - unless it merely verifies traffic matches a short whitelist.

Why talk about high assurance talked in the past tense? Is the field moribund? (I don't mean to point out a typo, if that's what it is.)

nickpsecurity

It could potentially be proven. We think it's unlikely due to cost or complexity. So, the fields default position was that most things would be low to medium assurance. Only select components would be high assurance. I hope that answers your broader question about adoption of seL4-like techniques for larger systems.

Far as past tense, I left high-assurance security when I found out Jesus died for my sins and rose again. Had to learn an entirely new life with a focus on human beings instead of tech. I'm recently considering going back into security since I spend up to 12 hours a day at work anyway (two jobs).

I should also note another reason I say past tense. Many people I learned from have died, retired, or switched focuses. The reason for the later is that there's no real market for high-assurance security. Most won't spend the money or make the tradeoffs. So, security-enhancing popular platforms while assuming they'll get hacked (and responding correctly) seemed like a better use of time. I mean, I could probably could all the high-security vendors on my hands...

HeavyRain266

I’m a fan of microkernel hosts for guest monoliths, thus our servers are running seL4 as safety layer and backups for FreeBSD VMs with jails for renderfarm, BEAM clusters and Jenkins.

All I’m missing is an ARM port of DragonflyBSD for its threading and in-process kernels (hybrid kernel design). My dream is to run it on 128 cores Ampere Altra to run OpenMoonRay more effectively.

ximus

Sounds like your setup would make for an interesting read in longer form

nickpsecurity

Could you elaborate on how your servers use seL4? And are these production, commercial servers?

HeavyRain266

Yes, commercial production servers. Similar to some other systems and certain consoles, microkernel is used as hypervisor that protect bare metal hardware from potential hazards when guest monoliths like mentioned FreeBSD becomes compromised. Guest components are running our software and services inside jails on two 128 core, two 64 core and one 32 core Ampere processors where first two machines run heavy workloads like offline rendering and Erlang VM clusters, while others are used to provide a few VPS servers, and internal services.

indolering

I thought support for multi-core was still lagging? Or are you just running the unverified version?

nickpsecurity

Thanks. That's similar to prior uses. Makes sense.

I think some readers next question might be: are you all hiring?

coldblues

https://genode.org/index

An operating system with seL4 support

Rochus

Are there notable uses of Genode?

TheAmazingRace

I gave a presentation on SeL4 for my local OWASP chapter. I'll have to see if I can dig it up.

The project truly is a nice piece of kit, but I would hesitate to consider it as a replacement for Linux, especially for general purpose computing. Though that isn't to say microkernels are terrible for it in general. RedoxOS seems to be making some inroads lately and it uses a microkernel written in Rust.

vacuity

The question is always "how big of a replacement are we talking?" Redox seems to be committed to interoperate well with POSIX, which naturally informs its design decisions. And there's a huge difference between having technical capabilities and succeeding. Although if Redox does succeed, it would already be a good step. seL4 is even more extreme in these qualities: its technical advantages are superb, but so far (and I think this will continue) it does not have what it takes, whatever that encompasses, to be the Next Big Thing. Ignoring political considerations, I think microkernels will be successful, and rightfully so.

snvzz

>I would hesitate to consider it as a replacement for Linux

It would depend on the scenario. Of course, Linux is easier to work with but OTOH there are requirements which only seL4 can satisfy.

There's a lot required on top of seL4 for it to be actually useful. Fortunately, there has been a lot of Open Source work there as well. This is where seL4 is in a much better position than just a few years ago.

For static scenarios, there's LionsOS[0], quite usable already.

For dynamic scenarios, there's the Provably Secure, General-Purpose Operating System[1], which is still early stages.

Both can be found in the Projects page[2] at trustworthy systems, which is linked in the seL4 website.

0. https://trustworthy.systems/projects/LionsOS/

1. https://trustworthy.systems/projects/smos/

2. https://trustworthy.systems/projects/

TheAmazingRace

Thank you kindly for providing these links. I do hope that microkernels take off in a big way for general purpose computing someday.

hackpelican

Does the OS that lies on top of this kernel need to be formally verified as well for the security guarantees to hold?

rubenbe

No, that's the advantage is that the kernel/processes don't need to be trusted since your kernel guarantees the isolation. So you can have a Linux kernel running next to some high security process with the guarantee that they will be isolated (with the exception of allowed IPC)

nabla9

No.

But there are limitations. DMA off, only formally verified drivers

It's also important to note that se4L multicore kernel is not yet verified.

snvzz

An IOMMU can help significantly with the driver problem, preventing a properly initialized driver from misbehaving and compromising the system.

vacuity

I want the magical IOMMUs that are maturely secure like MMUs are now. For now, I think various efforts in verifying/restricting/generating drivers are far better, although they fall particularly flat for proprietary drivers.

Koshkin

In absolute terms, sure. At the practical level, you can find a partial answer in the section 7.2 of the paper.

snvzz

The guarantees offered by the kernel cannot be subverted by unprivileged processes running on it.

Of course, the kernel is not very useful on its own, thus the design of drivers, filesystem servers and other services running on top of the kernel is still relevant.

Note that, unlike most other systems (including Linux) which are flawed at a fundamental level, seL4 actually enables the construction of a secure and reliable system.

marcosdumay

Well, as long as the hardware underneath it also enables the construction of a secure system.

I don't think we have any such option right now.

snvzz

An example of this is timing side channels.

Originating in an effort within seL4[0], there's ongoing work[1] in RISC-V to resolve this.

0. https://sel4.systems//Foundation/Summit/2022/slides/d1_11_fe...

1. https://lf-riscv.atlassian.net/browse/RVS-3569

alphazard

Also worth checking out Drew DeVault's Helios Microkernel, which is based on SeL4.

https://ares-os.org/docs/helios/

kevinherron

I think there's a meaningful difference between "based on" and "inspired by", Helios being the latter.

hannob

L4 was popular at my university (Karlsruhe). While I never really looked into it in any detail, it always appeared to me like a project that is primarily interested in testing some theoretical ideas, but not in building anything that would be practically useful.

That was 20 years ago. As far as I can tell, this has not changed. (Quick googling tells me there appear to be some efforts to build an OS on it, but they all look more like proof of concepts, not like something with real-world use.)

yjftsjthsd-h

https://en.wikipedia.org/wiki/L4_microkernel_family seems to describe it having been used all over the place, though mostly in embedded contexts.

> OKL4 shipments exceeded 1.5 billion in early 2012,[4] mostly on Qualcomm wireless modem chips. Other deployments include automotive infotainment systems.[13]

> Apple A series processors beginning with the A7 contain a Secure Enclave coprocessor running an L4 operating system[14] called sepOS (Secure Enclave Processor OS) based on the L4-embedded kernel developed at NICTA in 2006.[15] As a result, L4 ships on all modern Apple devices including Macs with Apple silicon.

ryao

I feel like the two of you are talking past each other as the other poster is talking about time sharing systems like UNIX while you are pointing at embedded applications that are OS optional and if there is any semblance of an OS, it is really minimal. People deploying L4 there is not very surprising and does not apply to his remarks. Seeing this is like watching one person discuss an absence of apples and seeing another reply saying that there are oranges.

hannob

Ok, admittedly, that's more than I expected. And hey, if they found some use cases in the embedded world, fine.

Nevertheless, it's not exactly how it was pitched. This Microkernel thing always had a "Linux is doing it wrong, we're doing it better" story behind it. They wanted to be a replacement for a general-purpose OS. That cetainly hasn't happened.

ryao

OSF/1 is the closest that microkernels ever came to being used in general purpose operating systems:

https://en.wikipedia.org/wiki/OSF/1

The only still living variant is AIX. Some of the code is in the XNU kernel, although that is a monolithic kernel rather than a microkernel. Apple converted it to a monolithic design for performance reasons.

That said, this article suggests that AIX is not using a microkernel either:

https://tedium.co/2019/02/28/ibm-workplace-os-taligent-histo...

xxmarkuski

Jochen Liedtke became a professor in 1999 in Karlsruhe, sadly he passed away only shortly after in 2001. I don't know if his successor Bellosa still does research on L4. There was the L4Ka project which appears to be completed. In the bachelor lecture on OS by him it's not part of the curriculum.

Rittinghaus, alumni of Bellosa, is involved with Unikraft [0], which was featured a couple of times on hn, and is using unikernel technology.

[0] https://unikraft.org/

__bjoernd

After Liedtke's passing, L4 research was continued in groups at UNSW (Gernot Heiser, formal verification -> SeL4) and TU Dresden (Hermann Haertig, Fiasco/L4Re, focusing on real-time and secure systems).

Genode (already mentioned in another comment) [1] came out of the TU Dresden group with some nice ideas around managing compartmentalized systems. Kernkonzept [2] is a startup commercializing the L4Re microkernel.

[1] https://genode.org/ [2] https://www.kernkonzept.com/

dloss

Well, an L4 variant is used in iPhones: "The Secure Enclave Processor runs an Apple-customized version of the L4 microkernel." https://support.apple.com/de-at/guide/security/sec59b0b31ff/...

froh

the L4Re derivative (open source) is running in every single id.X Volkswagen car in the central "icas1" ECU, carrying Linux and other guests.

https://www.kernkonzept.com/kk_events/elektrobit-advances-au...

afaics the L4Re kernel also is part of Elektrobit Safe Linux.

senko

I love the work (and the direction) the Karlsruhe team did on L4Ka, especially with Pistachio - the design was clean, simple, easy to grasp.

I did a Pistachio-based OS for my diploma thesis (not at Karlsruhe). I always thought that if I'd been studying there, I'd probably go into OS research.

nabla9

L4, L4Ka developed in Karlsruhe is not the same as seL4.

zzo38computer

I also had ideas of an operating system design and the capabilities I had considered uses the same features of interposition and delegation as seL4 does. There are other advantages than just what they list there; for example, you can apply a filter to audio, or you can use proxy capabilities for implementing network transparency.

The real-time stuff is something I had considered as allowing as an optional implementation; my idea was for a specification rather than only a single implementation.

Another feature I wanted though, is that all programs operate deterministic except for I/O. Without I/O, you can't determine the date/time or how long a program takes, can't check for processor features (if you use something that the hardware does not support though, the operating system may emulate it), etc.

I had thought to use a mixture of hardware support and software support to implement it, though. (There is a note in the document about attacks with capability implemented in hardware, but I do not have the referenced document and I don't know if that attack applies to what I am thinking of doing.)

russellbeattie

I think at this point the argument for or against microkernels is becoming moot. The only way to provide fast, efficient and safe access to privileged services is via hardware mitigations. Software can only do so much.

It's like the difference between a 80286 and a 80386: The latter added the hardware support for true multitasking that the former lacked. Since then there's been an ever increasing number of hardware-level protection mechanisms added, like those which enabled hypervisors.

Apple in particular has been adding a bunch of stuff to their SOCs to protect the kernel, drivers and components at the chip level, as well as enforce privileges on running threads and when using pointers. [1] This doesn't mean the OS is impenetrable, but it's a lot more effective than a software only strategy of managing privileges.

Seems to me that utilizing this stuff (or similar), the architecture of the kernel really isn't that important any more.

Am I off base?

1. https://support.apple.com/guide/security/operating-system-in...

indolering

The utility of microkernels is orthogonal to hardware/software co-design.

From a practical engineering perspective, monolithic kernels were faster/easier/had more resources behind them and security is as robust as can be had with C (i.e. best effort and full of bugs). Lots of hardware has been introduced to try and mitigate that mess. But you theoretically wouldn't need a security co-processor with SeL4 because you have a very high degree of confidence that processes are isolated from each other and there are no root level exploits. So yes, hardware/software codesign is important!

That being said, the team behind SeL4 has had to spend a lot of engineering resources on eliminating side channels from the hardware. Hardware is also faulty as the real world doesn't care about your physics simulation.

The benefits of a microkernel here is that it is small enough for formal verification to be tractable. The proof itself is 10x the size of the kernel. Context switching on SeL4 is an order of magnitude faster than Linux, so the performance impact should be negligible. But if we magically could verify the millions of lines of a monolithic kernel, then it would still be faster to not do context switching. Indeed, the SeL4 team tried to move the scheduler to userspace but it cost too much in terms of performance. So it was kept in and added to the proof burden.

Narishma

> It's like the difference between a 80286 and a 80386: The latter added the hardware support for true multitasking that the former lacked.

I'm not sure this is a good comparison. The 286 did support true multitasking in protected mode, and it was used in many non-DOS operating systems. What the 386 added (among other things) is the virtual 8086 mode which allowed it to multitask existing real mode DOS applications that accessed hardware directly.

vacuity

That doesn't seem right. Even with strong hardware protections, how is Linux's TCB comparable to a microkernel? Unless it just recreates the exact same protection domains, there will be more vulnerability in Linux. Rather, the thing about hardware is primarily that it makes things more efficient. For example, microkernels nowadays are already quite robust because they make good use of certain hardware: the MMU. Then the small TCB of a microkernel gives the kernel credibiity, so the kernel and hardware together make for a solid infrastructure. So really it's a matter of how much "cheating" we're allowed to do with hardware, but microkernels overall utilize the protection better. Or see exokernels.

MisterTea

> Am I off base?

Yes. There is still plenty of work to be done in the OS research space. There has to be software interfaces and API's to all this new hardware.

I also think there is a lot to be learned from using micro/hybrid systems which are much more composable. e.g. Plan 9 is a great example of a hybrid system which makes use of a singular protocol, 9P, to serve all objects in the system to userspace. It is hybrid because some parts are in-kernel to avoid the overhead of system calls such as IP, and TLS. Another interesting design aspect is how the in-kernel drivers are minimal mostly acting as a 9P interface to the hardware logic. This way you A. turn a machine object like a pointer or record into a seekable file and B. secured that file using standard unix permissions and C. can easily distribute the components across machines via a network. Now you can securely push the driver logic into a userspace program. And 9p is network and architecture transparent meaning you can work across multiple machines running on Arm, x86, mips, etc. All this come out of the box.

When I go back to Linux/Unix or Windows from Plan 9 its a sad and frustrating time. They're about as flexible as igneous rock. All the features are ad-hoc bolted on in mutually incompatible ways using a myriad of protocols that all do the same thing - serve files/objects. Ugh.

aurelien

The point about security is that it seems to present the same failure as kvm is for Linux kernel. If the hypervisor is in the ring 0 you have the risk of VM escape from one to another or the host itself.

How do you mitigate that risk?

snvzz

In seL4's virtualization support, VM exceptions are turned into messages and handled by VMM, a task running in unprivileged mode.

VMM has no more capabilities than the VM itself, thus a VM escape would be, outside of academics, of no value.

Refer to pages 8 to 10 in the OP PDF.

nimish

SeL4 is proof that microkernels are safe, efficient and scalable yet we are stuck with big honking Linux kernels in 2025. That said more and more drivers moving usermode anyway so it's a wash in the end.

vacuity

I will caution that IPC microbenchmarks should not be taken as confirmation that the "academic microkernel" is viable: OS services all in userspace and with fine granularity as is appropriate. Often microkernel-like designs like VMMs/hypervisors and exokernels make use of unikernels/library OSes on top, which reduces the burden of fast IPC somewhat. Or developers intentionally lump protection domains to reduce IPC burden. Of particular note: even seL4 did not evict its scheduler to userspace. Since the scheduler is the basis behind time management, it's quite a blow to performance if the scheduler eats up time constantly. My own thoughts there are that, with care, a userspace scheduler can efficiently communicate with the kernel with a shared memory scheme, but that is not ideal. But for desktop and mobile, a microkernel design would be delightful and the performance impact is negligible. We need far more investment on QoS there.

Edit: That being said, we should be building microkernel-based OSes, and if for some cases performance really is a restricting factor, they will be exceptions. The security, robustness, flexibility, etc. of microkernels is not to be understated.

indolering

They verified that the scheduler doesn't interfere with the integrity, confidentiality, and authenticity requirements of the kernel so it's a moot point.

vacuity

Rather, although I believe the seL4 scheduler is sufficiently general, I want a userspace scheduler to minimize policy. The seL4 team recognizes that a kernelspace scheduler violates Liedtke's minimality principle for microkernels, since the only motivating reason is performance. If an efficient userspace scheduler implementation exists, the minimality principle dictates no kernelspace scheduler. Otherwise there's pointless performance overhead and possibly policy inflexibility.

netbsdusers

Drivers in userspace is not particularly microkernelly - most of the major monolithic kernels have supported this to some degree or another for years (it is easy in principle, just transmit I/O requests to userspace servers over some channel) while many historic microkernels (see e.g. Mach 3) did not do it. It hardly changes the architecture of the system at all.

It is moving the higher-level things into userland that is the harder problem, and the one that has been challenging for microkernels to do well.

vacuity

People should stop bringing up Mach so much. It should never have been the poster child for microkernels. It's poisoned the discourse when there are plenty of alternative examples. Granted, Mach also did some good work in the space, but its shortcomings are emphasized as if they reflect the whole field.

More to the point, drivers in userspace is an important distinction between "pure" monolithic kernels and microkernels: the former is optimizing for performance and the latter is optimizing for robustness. It's not about ease of implementation for either. It's quite meaningful to shift on the axis nowadays: it represents a critical pragmatic decision (notice purity is irrelevant). You're right that "higher-level things" such as the networking stack or filesystem are also crucial to the discussion. I think here, too, ease of implementation is not relevant, though.

torginus

Isn't it common to run OSes on desktop/server environment inside hypervisors? That means the OS itself can be transparently virtualized or preempted, and access to physical hardware can be transparently passed to the virtualized OSes. This can be accomplished today with minimal impact to performance on user experience.

The fact that this can be done with OS code not explicitly designed for this signals to me that there are no roadblocks to having a high-performing general purpose microkernel running our computers.

vacuity

This leads to big units of code, namely multiple OSes, whereas the ideal is being able to use as finely granular units as developers are able to stomach. For example, Xen can work around the issue of device drivers by hosting a minimal OS instance that has drivers, but it is better to be able to run drivers as individual processes. This reduces code duplication and performance overhead.

ryao

seL4 having a proof of correctness does not mean all microkernels do. In fact, seL4 is the only microkernel that has a proof of correctness. If you build on top of it in the microkernel way, you quickly find that it is not performant. That is why NT and XNU both abandoned their microkernel origins in favor of becoming monolithic kernels.

mastax

I’ve seen this argument play out many times. I believe the next line is: “QNX proved that micro kernels can be fast given clever message passing syscall design.”

“I remember running the QNX demo disc: an entire graphical operating system on a single 1.44MB floppy! Whatever happened to them?”

“They got bought by blackberry, which ended as you’d expect. QNX had a lot of success in automotive though.”

“Nowadays Linux and Android are dominant in new cars, though, proving once and for all that worse is better.

exeunt. End Scene

nine_k

Also an illustration how an open-source solution, even if technically inferior, would displace a closed-source solution, even if technically superior, unless there is a huge moat. And huge moats usually exist only in relatively narrow niches.

ryao

Nice use of Latin. mihi placet.

netbsdusers

NT and XNU never had microkernel origins - Cutler explicitly refuted this at a conference in the early 1990s (if anyone remembers which, kindly share) and NeXTSTEP forked Mach at version 2.5, which was not a microkernel (see https://cseweb.ucsd.edu/classes/wi11/cse221/papers/accetta86...). XNU retained this monolithic architecture (as did Tru64 and other systems that trace their heritage to Mach prior to its version 3).

danieldk

NeXTSTEP forked Mach at version 2.5

Various sources state that they rebased to OSF Mach Kernel 7.3, which was based on Mach 3 and parts of Mach 4. The OSF MK ancestry of macOS XNU can still be seen in paths:

https://github.com/apple-oss-distributions/xnu/tree/main/osf...

rhet0rica

In more recent interviews Cutler has been firm that the NT kernel was designed pragmatically, a view that Tevanian also evidently later adopted, as evident with the great wheel of incarnation that Darwin went through on the road to XNU, although I'm not sure Tevanian ever stated his perspectives on this matter publicly. Neither of these systems were ever true monolithic kernels in the Linux or Unix sense—at all times they both had some measure of sandboxing between e.g. driver code and the scheduler—rather sitting somewhere between.

True microkernels are, alas, more of an ideology than a practical reality, as the long-suffering GNU/HURD team discovered; Tanenbaum has been clear that the MINIX/NetBSD experiment was more about principles than performance. That said, certainly many hypervisors have attained success with configurations that coincidentally happen to be the same footprint as a microkernel.

ryao

  NT and XNU never had microkernel origins - Cutler explicitly refuted this at a conference in the early 1990s 
Interestingly, the Tanenbaum–Torvalds debate had Tanenbaum claim otherwise at the very start:

  The alternative is a microkernel-based system, in which most of the OS runs as separate processes, mostly outside the kernel. They communicate by message passing. The kernel's job is to handle the message passing, interrupt handling, low-level process management, and possibly the I/O. Examples of this design are the RC4000, Amoeba, Chorus, Mach, and the not-yet-released Windows/NT.
https://groups.google.com/g/comp.os.minix/c/wlhw16QWltI

That is why I thought Windows NT had originally been intended to be a microkernel.

pjmlp

NT in Windows 11 has very little to do with the monolithic kernel story that keeps being repeated.

Not only did the graphics stack moved again back into userspace, there is now a complete userspace drivers stack, and VBS (Virtualization-based security) for several kernel components that run on their mini Hyper-V island, talking via IPC with the rest of the kernel.

Likewise on XNU land, Apple has started a crusade already a few years ago, to move all kexts into userspace, no more drivers in the kernel beyond the ones Apple considers critical.

In both cases, they never were a pure monolithic kernel, due to the way walls were introduced with kernel level IPC to talk across modules, instead of straight function calls.

el_pollo_diablo

> seL4 is the only microkernel that has a proof of correctness

ProvenCore (https://provenrun.com/provencore/) has a proof that covers correctness (memory safety, and more generally absence of UB, termination, etc.), functional properties (e.g. the fork() system call is a refinement of an abstract "clone" operation of abstract machines), and security properties (memory isolation).

sparkie

To the best of my knowledge, seL4 is not AVX-512 aware. The AVX-512 state is not saved or restored on a context switch, which is clearly going to impact efficiency.

At present there's 16x64-bits of register state saved (128B), but if we were to have full support for the vectors, you need to potentially add 32x512-bits to the state (plus another 16 GP registers when APX arrives). Total state that needs moving from registers to memory/cache would jump to 2304B - a 1800% increase.

Give that memory is still the bottleneck, and cache is of limited size, a full context switch is going to have a big hit - and the main issue with microkernels is that a service you want to communicate with lives in another thread and address space. You have: app->kernel->service->kernel->app for a round-trip. If both app and service use the AVX-512 registers then you're going to have to save/restore more than half a page of CPU state up to 4 times, compared with up to 2 on the monolithic kernel which just does app->kernel->app.

The amount of CPU state only seems to be growing, and microkernels pay twice the cost.

Veserv

The cost of moving 4 KB is miniscule. Assume a anemic basic desktop with a 2 GHz clock and 2 instructions per clock. You would be able to issue 4 billion 8-byte stores per second resulting 32 GB/s or 32 bytes per nanosecond. Memory bandwidth is going to be on the order of 40-60 GB/s for a basic desktop, so you will not run into memory bandwidth bottlenecks with the aforementioned instruction sequence. So, 4 KB of extra stores is a grand total of 128 additional nanoseconds.

In comparison, the average context switch time of Linux is already on the order of ~1,000 nanoseconds [1]. We can also see additional confirmation of the store overhead I computed as that page measures ~3 us for a 64 KB memcpy (load + store) which would be ~187 nanoseconds for 4 KB (load + store) where as the context saving operation is a 2 KB register -> store and a 2 KB load -> register.

So, your system call overhead only increases by 10 percentage points. Assuming you have a reasonably designed system that does not kill itself with system call overhead, spending the majority of the time actually handling the service request, then it constitutes a miniscule performance cost. For example, if you spend 90% of the time executing the service code, with only 10% in the actual overhead, then you only incur a 1% performance hit.

[1] https://eli.thegreenplace.net/2018/measuring-context-switchi...

vacuity

I agree that a well-designed system for most use cases won't have performance issues, since we should not just be optimizing context switches but also things like kernel bypass mechanisms, mechanisms like io_uring, and various application-guided policies that will reduce context switches. Context switches are always a problem (the essential complexity of having granular protection), and moving an extra 4KB is not negligible depending on the workload, but we are not out of options. It will take more programmer effort, is all.

vacuity

I appreciate your criticism; it is reasonable and relevant in the present and the future. I wrote a reply nearby in the thread[0]. I think it's mostly solveable, but I agree it's not trivial.

[0] https://news.ycombinator.com/item?id=43456628

arakageeta

I’m not sure that seL4 even supports NUMA, so there are other tradeoffs to consider.

vacuity

I think NUMA management is high level enough that in a microkernel it would be comfortably managed in userspace, unlike things relevant to performance-critical context switches. And seL4 is currently intended only for individual cores anyways.

snvzz

>AVX-512 and other x86 warts

While seL4 runs on x86, it is not recommended.

Effort seems to be focused on ARM and RISC-V, particularly the latter.

An argument could be made that x86's baggage makes it a particularly bad fit for offering the sort of guarantees seL4 does.

nickpsecurity

High-assurance security requires kernels clear or overwrite all shared state. That could become a covert, storage channel if one partition can write it and another read it. If so, it should be overwritten.

null

[deleted]

pjmlp

I feel like containers and Kubernetes are microkernels revenge.

They are for all practical purposes fulfilling the same role.

bri3d

Not at all? If anything they’re filling the opposite role. Microkernels are about building interfaces which sandbox parts of the kernel. Namespaces are about giving sandboxed userlands full access to kernel interfaces.

pjmlp

Namespaces is one form of capabilities.

Additionally a Linux kernel that exists for the sole purpose to keep KVM running, while everything that powers a cloud workload are Kubernetes pods, it is nothing more than a very fat microkernel, in terms of usefulness.

cedws

I heard a joke somewhere that sel4 is even more successful than Linux because it is running below ring 0 in every Intel chip shipped in the past N decades, plus probably many others.

CalChris

Intel used a version of Minix rather than seL4 for its Intel Management Engine. [1] There was some controversy about this because they didn't give Andrew Tanenbaum proper credit. [2]

[1] https://www.zdnet.com/article/minix-intels-hidden-in-chip-op...

[2] https://www.cs.vu.nl/~ast/intel/

indolering

Such a dumb technical choice driven by stupid managerial considerations. They do this ring -1 shit because hardware companies view this as a cheap way to add value. But they don't open source it or contribute back because they view it as secret sauce. Minix as a result didn't get the investments that GPL software receives. Now the project is in hard legacy mode.

ianburrell

Intel Management Engine is a separate microcontroller integrated into the chipsets. Recent ones are Intel Quark x86 CPU and Minix 3 OS.

mrkeen

That is Minix, not SeL4.

naasking

seL4 is used in a lot of cellular phone firmwares I believe.

phendrenad2

Yes and there's a very good reason: Linux is safe enough, more efficient, and more scalable.

vacuity

I suppose I should ask the other side too, though I am biased to favor microkernels, and am better read on them, but how so?

"Safe enough" measured by the standards of this upside-down industry...I'll let everyone decide that for themselves.

"More efficient": while monolithic kernels have a higher ceiling, currently there's plenty of OS research papers and industry work demonstrating that more integration with userspace brings performance benefits, such as in the scheduler, storage, or memory management. Microkernels encourage/enforce userspace integration.

"More scalable": I think this has less to do with kernelspace scale and more with how nodes are connected. See Barrelfish[0] for eschewing shared memory in favor of message passing entirely, running separate kernels on each core. Meanwhile Linux has gradually discovered a "big lock" approach is not scalable and reduced coarse-grained locks, added RCU, etc.. So I think we will go moreso towards Barrelfish. But on a single core, for what goes into the kernel, that's covered by everything but scalability.

[0] https://people.inf.ethz.ch/troscoe/pubs/sosp09-barrelfish.pd...

naasking

"Safe enough" means "actually unsafe".

null

[deleted]