How to handle people dismissing io_uring as insecure? (2024)
103 comments
·July 21, 2025vjerancrnjak
weitendorf
I don't work at Google anymore and don't have any special insight into the internal adoption of io_uring, but I think it stands to reason that Google would benefit tremendously from rolling out a higher-performing way to do IO across their fleet. I mean, having myself done some lowish-level performance/optimization work and knowing that the impact of these kinds of changes is measurable and the scale is almost fleetwide, I wouldn't be surprised if the benefits - after major internal libraries/tools are also updated to use io_uring - are O(Really Big Money)
Having talked to members of their prodkernel team about other subjects, I also think they are competent enough to know the difference between "not ready" and "acceptably flawed". And believe me, the incentives are such that O(Really Big Money) optimization projects get staffed unless there is something making them infeasible.
Not everybody has the same threat model and security stance as Google and that's ok. But personally I would take their internal adoption of io_uring very seriously as a measure of whether it's safe for me to adopt it, especially if I'm running untrusted or third party software (including certain kinds of libraries).
ciconia
> the incentives are such that O(Really Big Money) optimization projects get staffed unless there is something making them infeasible.
Switching to io_uring is not just moving from one API to another. It necessitates a serious rethinking of your concurrency model. I guess for big, established codebases this is a very substantial undertaking, security consideration notwithstanding.
weitendorf
On the library/internal workload side the impact would certainly not be something that fully lands overnight, but Google has a very centralized tech stack and special tooling for fleetwide code migrations. I have no insight to the particulars but I would guess there is a Pareto-like distribution of easy upgrades+big wins and a long-tail of marginal/thorny upgrades.
Google is big enough and invests enough in infrastructure projects that they staff projects like making their own internal concurrency primitives (side note, factors like this can improve/reduce or simplify/complexify migrations substantially): https://www.phoronix.com/news/Google-Fibers-Toward-Open
junon
Eh let's not be dramatic, if you're already using async runtimes of some sort it's not that much of an upset to switch.
delusional
Disabling it on Android and ChromeOS does not mean they don't use it internally. Android and ChromeOS is end user devices, optimizing those platforms don't earn google any money.
weitendorf
Can you find anywhere that states that they are using it internally? They have publicly stated at various points that they do not, such as at https://security.googleblog.com/2023/06/learnings-from-kctf-... and I have not seen anything yet stating that they are now using it. Also, you might want to reread my comment because I wasn't talking about Android/ChromeOS, it was exclusively about their "fleet" by which I meant "servers"
By the way, here is a good + recent example of the types of CVEs that IO_uring runs into that google finds and discloses/fixes: https://project-zero.issues.chromium.org/issues/417522668. Here's another: https://project-zero.issues.chromium.org/issues/388499293
Given that io_uring mostly seems to be the project of one guy at Meta, and has a regular stream of new and exciting use after free/out of bounds vulnerabilities, I think it makes sense for security-inclined users to disable it or at least only use it once soaked/stabilized
rahkiin
GP: > > as well as Google servers
dathinab
them disabling it is only about Android/Chrome
not about their servers
I wouldn't be surprised if they do have servers with it enabled when very useful.
and Android Linux kennels lack behind in their version
weitendorf
No, it was about servers, and I worked there on similar stuff/with the same people involved in the serverside ("fleetwide") rollout. Public post describing the decision to disable it internally: https://security.googleblog.com/2023/06/learnings-from-kctf-...
I'd love to see a post explaining a decision to consider it stable or that mentions that they've rolled it out on their fleet
flomo
Without going into the weeds, there has be some vendor support, and that vendor is obviously not google. How to convince people: Get it into RHEL.
stefanha
io_uring is available from RHEL 9.3 onward. The catch is that it's disabled by default and needs to be enabled at runtime via the "kernel.io_uring_disabled" sysctl.
rendaw
If that's the case, it's not indicated by the quote. The quote lays all the blame on io_uring. Is that incorrect?
dathinab
yes but what this isn't telling you is that android has a long history of running hopelessly outdated kennels and it being very common that Linux kernel related android cves related to newish features have already been fixed upstream by generic improvements to that feature code
yjftsjthsd-h
I like how someone helpfully added
> Although initial async offload design in io_uring could be problematic, later kernels changed the thread model. After such improvements, there were no known inherent problems with it and its development is very careful with new features. Considering that a performant async framework with a user facing API is complex, it was to be expected that issues would be found initially. After initial issues have been addressed, it is not any less secure than anything else in the kernel and io_uring acceptance quickly grew in production. Some of its criticism are also based on wrong or outdated assumptions.[14]
...but the only citation is a link to this GH thread, which doesn't support the claims made.
znpy
Jens Axboe replies on the very first line of the thread:
> As I'm sure you know, this is all mostly centered around a) google using an old kernel on android
fulafel
But also
> My hope is that this reputation will go away eventually, as less issues are found in the code.
this has not yet happened like this other comment shows: https://news.ycombinator.com/item?id=44632639
serial_dev
> How can I help people out when they tell me that io_uring is insecure?
Maybe those people are right, though? I think the discussion starts from a place that assumes other people are wrong. If you start there, you will fail to convince people of anything, because you automatically dismiss their claim, without thinking about what they might have seen and what they might think.
A better starting point would be wanting to get to the bottom of it, and assess the security of io_uring. If you start from that point and you give it an honest, thorough assessment, and it turns out it "looks secure", you'll have an easier time convincing people.
You might still be wrong (assessing io_uring's security is not trivial), but at least you tried to understand why people think that.
And reminder: it's ok to "agree to disagree".
Arch-TK
People are saying: "Oxygen is blue and that's why the sky is blue." Someone is replying: "The sky isn't blue because of the oxygen." You are then saying: "Well what if the people who are saying that the sky is blue because of oxygen are right."
Although it gets a bit more complicated, the statement `io_uring` is insecure might be true, that's not really in dispute here. The people who are saying it, aren't saying it because they know it to be true, they are saying it because they heard about security issues in the context of `io_uring` and assumed that using `io_uring` would make your code less secure.
This is incorrect, the security issues are in security features in Linux which have not been updated to handle `io_uring`. This means that your application won't be any less/more secure when using `io_uring`. But your system might be less secure if you have support for `io_uring` enabled and applications can make use of it.
Moreover, the "security issues" are only undoing security related hardening you would have put in place over the baseline, they're not putting you below baseline.
That's why a statement such as `io_uring` is insecure isn't very useful.
If these people make the argument that: "I don't want to use `io_uring` because that would mean that security conscious system administrators would not want to run my software as a precaution." then it would make sense and nobody would be disputing it.
VWWHFSfQ
> Maybe those people are right, though? I think the discussion starts from a place that assumes other people are wrong.
I think this is the right approach. We know that io_uring has a somewhat significant history of critical security issues. It's not enough just to point out that "these 3 critical CVEs were fixed in the last 12 months, it's secure now!"
Reputation and trust has to be built over a long period of time.
jay-barronville
> Maybe those people are right, though? I think the discussion starts from a place that assumes other people are wrong. If you start there, you will fail to convince people of anything, because you automatically dismiss their claim, without thinking about what they might have seen and what they might think.
Bingo. This is the correct approach. Very well said!
yc-kraln
I have a somewhat different problem with io_uring in practice: It's extremely hard to use /correctly/. The management of buffers which bounce across a kernel boundary and may-or-may-not end up in the same original thread lends itself to lots of subtle race conditions, resource exhaustions, and ABA issues. It's not that you can't make it work, and work well--it's that it's hard to do correctly, and very easy to make something which works 99.99% correctly, and then fails spectacularly under load or over time.
I can imagine the security implications are the same.
yjftsjthsd-h
> How to handle people dismissing io_uring as insecure?
It is, in the general case, hard to prove something secure (because it's hard to prove a negative). It might help to show CVEs per month/year/whatever related to it vs anything else, preferably with a clear downward trend. For example, you could look at https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=io_uring ... although I struggle to read that as supporting the case you want to make.
> I have had to deal with handful of these people from different sectors as well. Since I am actively working on project based on io_uring, I have had people saying all kinds of hmm... "crap", its so baseless! Can't even talk to them with actual facts.
So what are those facts? Because all this thread has is people handwaving that it used to have a worse design, and everything has bugs and this isn't different, and implying that it's better. If it's better, show that.
yorwba
> you could look at https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=io_uring
CVE count by year:
2019: 1
2020: 1
2021: 10
2022: 15
2023: 19
2024: 21
2025: 10
franciscop
I expect CVEs to be directly proportional to project usage and popularity, and inversely proportional to maturity, which makes things a lot more complicated.
sigmoid10
I'd also expect average CVSS severities to go down over time. While they definitely did get significantly lower in 2024, there's still some high severity stuff in 2025.
delusional
And also directly proportional to the publicity of the CVE system. If you're creative enough in your writing, any bug in any program can be filed as a CVE, and filing CVEs is much more interesting carreer wise than filing bug reports.
Any decently sized project has probably seen an increase in reported CVEs over the past 5 years, simply because the number of CVEs total has grown.
jeroenhd
Looking through these CVEs, very few of the recent entries seem to be actual security bugs. Most are run-of-the-mill bugs as far as I can tell.
If a kernel panic is considered a security issue, anyone using Nvidia's drivers should fear for their lives.
dmvdoug
This has to do with their policy on assigning CVE numbers, which is that pretty much any bugfix might be security-related because it’s the kernel, so it doesn’t take much to get a number assigned. See https://docs.kernel.org/process/cve.html.
gkbrk
> If a kernel panic is considered a security issue
It's normal to consider a non-root userspace program causing a kernel panic a security issue
Retr0id
It seems a little dubious to brand something "insecure" based on the number of fixed bugs.
Is io_uring a complex and therefore bug-prone API surface? perhaps.
The `curl` project has a similar number of CVEs listed if you search for it, but we generally don't characterise curl as insecure.
If you're not using io_uring then it could make sense to disable it as a hardening measure, but I don't think the existence of now-fixed CVEs is a reason not to use it.
dmvdoug
CVE statistics are also pretty hard to interpret in light of the kernel team’s willingness to assign CVE numbers for most any bugfixes.
ibotty
One problem is that you can't filter its "syscalls" as you can regular syscalls. This removes a security boundary that e.g. container runtimes regularly use. So you cannot use it in your regular kubernetes cluster without weakening its security for these pods.
holowoodman
This just reinforces the (maybe unfounded) impression that security is a secondary consideration, and performance is primary.
I'd use io_uring in a heartbeat on a dedicated system where the job is only I/O and security isolation isn't a concern. But multiuser/multiapplication/networked? Not a chance.
weitendorf
I think there is a very large amount of overlap between the people who
1. know what io_uring is
2. are interested in performance enough to look at improvements based on new linux kernel system calls and talk about it in public
3. care about security in multitenant environments or the syscalls used by third party libraries
I think io_uring right now probably makes a lot of sense for HPC and highly technical, performance-sensitive financial stuff, but they can be kind of insular. I don't think most linux hobbyists really need the performance benefits enough t care about it, and most businesses are using a major cloud vendor/don't have the scale or expertise to be thinking about this kind of stuff. Which leaves major cloud providers and really big businesses like Meta with their own internal clouds as the ones that stand to benefit enough to care about performance while really caring about security
Asmod4n
There should be no issue with disabling it altogether by banning its setup and usage syscalls.
accelbred
Yup, but that leads to io-uring devs complaining that people dislike software using io-uring because it doesn't run in containers/etc blocking io-uring entirely
holowoodman
Which would be prone to misconfiguration, accidents and exploits. Better to not include it at all.
skissane
Isn't the issue here just that io_uring needs to be enhanced such that, when a seccomp-bpf filter is installed, the filter gets called to approve each SQE, before it gets executed?
Someone
That can be done, but reading https://lwn.net/Articles/902466/, writers of security tools are unhappy that:
- io_uring initially was conceived without considering security or auditing tools
- io_uring later was changed to allow ioctl calls, even though security people do not like ioctl because what its arguments mean depends on the device being called (possibly even on the version of the driver), not on the type of device, and often is poorly documented, making it hard for a security filter to decide what to do with a command.
That also made them fear that similar security-breaking changes might be made in the future.
tsimionescu
I don't think this is an appropriate use of "just". If io_uring doesn't work with seccomp-bpf filters today, there are many situations where you just can't use it, period.
That someone with kernel IO dev experience may be able to relatively easily add such a fetaure in the future (though I would doubt that, given that it hasn't yet been implemented apparently) doesn't make it a small problem.
coppsilgold
I believe you can deny io_uring altogether with the syscalls io_uring_enter, io_uring_register, io_uring_setup?
This would be useful if you want to boot with io_uring but deny it for some sensitive workloads.
altairprime
Is that a true limitation that cannot be overcome? Are solutions possible and/or available, but require further work to be shipped?
spwa4
What regular filter for syscalls do you use?
holowoodman
seccomp BPF, eBPF, in a way SELinux/AppArmor/Tomoyo/..., maybe you can even call namespaces some kind of syscall filter. And then there is the auditing framework, where you can at least record which critical syscalls were performed.
Nowadays its mostly a combination of eBPF, SELinux and auditd plus namespaces in case of containers. Usually in the combination that some distro ships, so nothing really fancy.
lima
seccomp-bpf, for instance.
null
fabian2k
I admit, I was confused a bit as well about the io_ring security reputation. Though I didn't really follow the topic, so the clarification that this was mostly about an older design on Android is quite helpful.
The potential performance benefits are quite compelling, e.g. in Postgres 18 you reportedly can get a 3x speedup over the old sync behaviour in simple read queries.
txdv
One of the most interesting aspects of software development is that it is still done by humans. Information moves slowly and is very generalised, with little attention to detail.
I have seen this multiple times when developers were still reciting old benchmarks, taken out of context. It often becomes very tribal and centred around technologies.
johnisgood
It is kind of like PHP. Their views of PHP is still stuck at PHP 5. We have PHP 8.
xlii
Maybe it is (mine is for sure) but that's a "bitten by the dog once" case.
You got bitten and everyone around you assures you won't get bitten again, but the pain was real and you still have a scar from the event. Why bother or invest risking another bite if there are other places to be which had never bitten before.
Over my career I hated technology 3 times. First was PHP, second was Python during Python 2/3 fiasco and third was CoffeeScript.
Edit: till this day one of my favorite meme is titled "PHP: Training wheels without a bike"
yjftsjthsd-h
A few years ago, I used BTRFS on a laptop. Single disk, no RAID of any kind, OpenSUSE (which favored BTRFS, so I expected it to be as well supported as could be had), nothing fancy. After losing the root filesystem twice, I decided that maybe I shouldn't trust BTRFS. Since then, I've been told that it's totally better now, that all the problems are with bad RAID setups, and it's safe and won't lose my data. Anyways, as I type this from a laptop running on ZFS, I remain somewhat cautious.
jiggawatts
> Since then, I've been told that it's totally better now
I've read through some bug reports, and I assure you that BTRFS remains a horror show.
Saying that comments from the dev team "don't inspire confidence" is putting it mildly.
ZFS is the diametric opposite of this, where blog posts from the team working on it made me realise that they're moving the state of the art into new territory.
itslennysfault
I went through your exact same hate timeline. The CoffeeScript one was so bad that I was REALLY hesitant about TypeScript, but the whole "it's a superset of JS" thing won me over in the end.
I still hate PHP the most, and I very much mean PHP 5 when I say that, and have no idea what happened beyond that, and honestly the scars are so deep I don't care to find out.
em-bee
from my experience the primary problem with PHP was not the language itself (although the design was/is somewhat quirky/inconsistent) but with the large influx of inexperienced programmers using it, creating low quality code and thus affecting the reputation of the language. same with javascript today.
i haven't heard any issues about coffeescript.
with that in mind, i'd love to hear your stories. how did you get bitten?
xlii
It wasn't a horror story but a simple fact that it was layer of complexity that wasn't really helpful at all.
When code volume was small, nobody noticed, and hey "it looks nice". Some time after though, when volume increased it started to get really burdensome. I used this as a part of Rails pipeline, so it was like: write some coffeescript, compile, run - something failed - usual process.
However the code was already mangled, often source code mapping didn't want to work. When source code was found it wasn't uncommon that it caused by operator precedence or code not transpiled in a way it was intended and requiring debugging transpilation process.
At some point I suggested to migrate away from CoffeeScript toward (almost plain) JavaScript and most developers happily agreed to that. We were able to migrate big chunk automatically, rest took only few weeks to clean up. Velocity increased and people were happy they don't have to deal with it anymore.
Ultimatelly it is the truth with most of the transpilers - sooner or later you get into idiosyncracies that - if technology is not popular enough - you're left alone to solve.
lexicality
As someone that used Coffeescript a bunch, the problem is it's designed to make it very easy to write code, but has very little thought given to being able to understand that code again in 6 months time.
This means it's very easy to knock out an entire project in record time, but subsequently very difficult to debug/maintain/update the same project when you come back to it. It's essentially a technical debt generator.
WJW
It's with so many things! Some evergreens just from last week here on HN:
- Rails is horribly slow!
- Python is still stuck migrating to Python 3!
- MySQL doesn't scale!
- Haskell tooling sucks!
- io_uring is insecure!
- ... and dozens more just in the programming world. These are just off the top of my head. Probably hundreds more in the wider engineering world.
dijit
MySQL would happily clobber your data silently for decades though.
I think the situation is better now, but that was worthy criticism for a long time; like anything though, those that bought into it wholeheartedly couldn’t take the criticisms.
I wonder if there’s a word for that, seems to be a common issue.
als0
> those that bought into it wholeheartedly couldn’t take the criticisms.
"invested"
hyghjiyhu
From my perspective python3 migration is very much complete. Package managers is the evergreen mess now.
continuational
Programming languages are different. They usually can't be "repaired" once broken, because they need to maintain backwards compatibility.
concerned_user
This is only true if people stick to a version of a language and don't upgrade.
If you upgrade then, for example, you can't run all of your PHP 5 code in PHP 8, most of it you can but you will have to change the parts that are broken, which are the areas that are repaired in PHP 8.
Same goes for other languages like C# or Python
nromiun
They don't "need" to maintain backwards compatibility. Several major languages have broken it by now.
jeroenhd
Programming languages deprecate features in standard libraries all the time. As PHP did, causing many PHP 5.3 applications to fail catastrophically once the warnings added to PHP 5.4, 5.5, and 5.6 were turned into errors. Of course, maintained software rarely ever runs into this issue.
The standard libraries were the lacking part in PHP. The language itself was never a serious problem.
pjmlp
They are software products like anything else.
nicman23
the work that the people in PHP did for 7 is criminally underrated
dolmen
It is kind of like Perl. Their views of Perl is still stuck at CGIs of the 90's.
I still think that Perl was and is still a better language than PHP and Python. And people never had a serious look at raku because of the Perl heritage.
But that ship has sailed.
em-bee
huh? that would suggest that perl has changed so much since the 90s that those views would no longer be valid.
CGIs in the 90s were written in perl 5, and perl hasn't evolved much since then. but perl was never a bad language. it had/has a quirky syntax. if you were coming from a lot of commandline work using sed, awk, tr, and many other tools to manipulate text in shell scripts, then perl made a lot of sense.
and while raku may have been a great improvement, perls reputation was tied to its syntax. the readability and writeability of the code. and that's the one thing that raku did not change significantly.
so unlike other examples i don't see how peoples views of perl could improve
johnisgood
I love Perl. Is Raku worth trying considering I love Perl for its performance, syntax, brevity, and so forth?
Raku feels like a different language to me, and it seems not-so-serious, maybe because of the name and their mascot (which is unfair to the language).
topspin
> and perl hasn't evolved much since then
Perl (5) has been evolving, quietly. The language has had many nice additions over the years. Some of it is backporting from 6.
Xenoamorphous
It’s true of most languages that have been around for a while. E.g. people who don’t write Javascript think of it as it was pre-ES6, not to mention that many people don’t write JS directly and use Typescript instead.
Same with Java, it’s somehow stuck in time in Java 6 in some people’s minds.
nurettin
Wow. My view is stuck at php 4. Didn't even know 5 existed. I only remember Facebook making a big deal of hhvm and php 7 and laravel being considered great but not as great as rails or django.
johnisgood
It is worth checking out PHP for when you are building backend. I did not use Laravel because I had some issues with it (lack of flexibility in some cases) so I ended up creating my stuff from scratch. I am glad I did. It is well documented, modular, maintainable, not bloated, etc. I can give you the resources I have heavily used.
nurettin
You're very kind. Thanks for the recommendation. If for some reason I have to branch out, I will check php again.
weinzierl
Convince AWS to allow it in managed container and serverless runtimes.
When people see what Amazon does it is far too easy to conclude: "Huh, if Amazon blocks it there must me something to it."
perks_12
Works on the most sophisticated io_uring project, has no idea how to defend it. Nice.
topspin
It's tragic, and it bothers me that Jens Axboe's work is suffering due to this. Obviously, with the clarity of hindsight, this might have been avoided. Now the cost of the damage it high.
My idea: consider formal verification. A rigorous formal proof of behavior is capable of solving actual flaws and probably capable of overcoming the bad PR.
That would require a great and ongoing effort. However, given the nature of io_uring, it's likely rather amenable to formal verification, and ultimately it is probably necessary. Perhaps our new "AI" tools could greatly reduce the effort.
Anyhow, that's my brilliant thought...
null
I think it's the Wikipedia article.
https://en.wikipedia.org/wiki/Io_uring
Very easy to just quote that without any io_uring experience.
> In June 2023, Google's security team reported that 60% of the exploits submitted to their bug bounty program in 2022 were exploits of the Linux kernel's io_uring vulnerabilities. As a result, io_uring was disabled for apps in Android, and disabled entirely in ChromeOS as well as Google servers. Docker also consequently disabled io_uring from their default seccomp profile.