New speculative attacks on Apple CPUs

364 comments

·January 28, 2025

omcnoe

Their SLAP demo provides a great example of how defence-in-depth can make/break the viability of an exploit. That terrifying Safari demo is possible because Safari fails to isolate new windows in individual processes when calling `window.open` in js.

All the other side channel magic presented here doesn't matter if the data you want to read is in a seperate process with sufficient separation from the "hostile" process in the address space.

lxgr

That's not a failure of Safari, it's required by window.open API semantics, in particular by the default Cross-Origin-Opener-Policy of "unsafe-none" [1].

By setting a different policy, sites can protect themselves against this.

I guess technically browsers could open new windows in a new browsing context group regardless of this setting and relay the allowed types of messages via IPC (if any), but that would be a major performance hit, and I don't think any other browsers do it differently.

[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cr...

lxgr

Can't edit my original post anymore: Firefox and Chrome do seem to isolate even same-browsing-context-group and bridge the required APIs via IPC, so hopefully Safari will catch up at some point.

Basically, there are three scenarios:

- Completely unrelated tabs (e.g. those you open manually, those opened via command-click, tabs opened via '<a target="_blank' ...>" or 'rel="noopener"' references etc.) – these are relatively easily isolated if the browser supports it at all. All major (desktop!) browsers now largely do this, including Safari.

- "Same browsing context group" (but different origin) sites. These can communicate via various APIs, and historically that was achieved by just letting them run in the same rendering process. But in the face of attacks such as this one, this can be insecure. Firefox and Chrome provide sandboxing via separate processes; Safari does not.

- Same origin sites (without any stricter policy). These can fully access each other's DOM (if they have an opener/opened relationship), so there's not really any point in having them live in different renderers except possibly for fault isolation (e.g. one of them crashing not taking the other down). As far as I know, all browsers render these in the same process.

Sites can opt out of the second and third category into the first via various HTTP headers and HTML link attributes. If we were to design the web from scratch, arguably the default for window.open should be the first behavior, with an opt in to the second, but that's backwards compatibility for you.

MikeHolman

I worked on a browser team when Spectre/Meltdown came out, and I can tell you that a big reason why Firefox and Chrome do such severe process isolation is exactly because these speculative attacks are almost impossible to entirely prevent. There were a number of other mitigations including hardening code emitted from C++ compilers and JS JITs, as well as attempts to limit high precision timers, but the browser vendors largely agreed that the only strong defense was complete process isolation.

I'm not surprised to see this come back to bite them if after like 7 years Apple still hasn't adopted the only strong defense.

DarkBell13

Does Safari always open sites in separate processes when manually opening a new tab (e.g., via Command+T or via another macOS app sending a link to be opened by Safari) instead of allowing a webpage in one tab to open a link in a new tab via window.open? If so, does that prevent the SLAP attack from working against the contents of those manually opened tabs? Wouldn't the best practice, then, be to (1) never login to a website (or access a site where you are already logged in) by clicking a link on another site, and (2) when browsing a site where you are logged in, never click a link to another website, but instead copy the link, manually open a new tab, and then paste the link into the address bar? Obviously, that's cumbersome and annoying, but if it mitigates SLAP, then maybe it's worth the effort.

omcnoe

Cross-Origin-Opener-Policy seems like a case of bad defaults where a less secure option has been selected so that we don't break some poorly maintained websites. Better to get the actual users of `window.open` to fix their code than to make every website insecure out of the box.

I can't imagine there are many sites passing significant amounts of data through this, the small number of users where IPC poses too high a penalty can opt their sites into the "same process" flag if really needed.

lxgr

Forcing every website to adapt to a browser update is completely infeasible.

> I can't imagine there are many sites passing significant amounts of data through this

This is actually a quite common mechanism for popup-based authentication (which is much more secure than iframe-based one, as users can verify where they're potentially entering their credentials).

eptcyka

Other browsers do site isolation, why can’t Safari? (:

lxgr

Safari definitely does use site isolation (if you check "Activity Monitor", you'll find Safari processes named after the sites they're displaying) in almost all cases.

window.open, in some constellations, is an exception, because the opening and opened sites have an explicit communication facility available to them, unless at least one of the two explicitly opts out of it. As far as I'm aware, Safari also correctly implements that opt-out.

The only thing that Chrome and Firefox seem to be doing on top of that, as far as I understand, is to actually enforce process-per-site even for sites from the same "browsing context group" (i.e. all that can hold programmatic references to each other, e.g. via window.opener), which requires using IPC.

zamalek

Right, in what sane world would the website determine operating system process semantics? What next, syscalls?

astrange

Have you noticed how often people complain Chrome uses too much memory?

dadrian

It is not required by window.open semantics, you can absolutely implement site isolation even in the presence of COOP unsafe-none

lxgr

Sorry, I was imprecise in my original post: It's definitely possible to isolate even sites in the same browsing context group, but it requires more work that Safari apparently just hasn't got around to yet.

davrosthedalek

Would that performance hit be really that significant? I can't imagine there are more than a couple of calls total, and that's all dwarfed by any web access. Or do I misunderstand what's required?

wging

So should Protonmail (and any other site with similarly sensitive data) be setting that header, then? It’s probably hard to change the default. I bet some use cases (SSO popups?) depend on it.

connicpu

It's not unreasonable to set a different header value for the login page only, where it should be safe because no external user data is being rendered.

lxgr

Sites can also opt into the same behavior by setting the rel="noopener" or alternatively target="_blank" attributes on outgoing links (i.e. <a> tags).

And yes, something like a webmail site should definitely be setting the header, or at lest these attributes for outbound content links.

fweimer

Would it help to use separate processes and share the data on demand only, via IPC with a robust speculation barrier?

lxgr

Yes, and that’s what Firefox and Chrome are already doing.

null

[deleted]

btown

This, 100%. From the SLAP paper linked in the OP https://predictors.fail/files/SLAP.pdf :

> Considerations for Safari. We emphasize the importance of site isolation [55], a mechanism preventing webpages of different domains from sharing rendering processes. Site isolation is already present in Chrome and Firefox [42, 55], preventing sensitive information from other webpages from being allocated in the attacker’s address space. While its implementation is an ongoing effort by Apple [3, 4], site isolation is not currently on production releases of Safari. On the contrary, we also reflect on [Webkit's memory allocator] libpas’s heap layout from Section 6.3, allowing sites to not only share processes, but also heaps. Partitioning JavaScript heaps by at least several memory pages per-webpage would prevent JavaScript strings from the target webpage from being allocated within the 255-byte reach of the LAP.

H8crilA

No, only some of the side channel magic doesn't matter if you live in a different virtual memory space. Other past attacks didn't use virtual memory pointers and used physical memory pointers or didn't use any pointers at all - one could read data from another process, the kernel, another VM, the SGX enclaves or even proprietary CPU manufacturer code that runs on the CPU, like the CPU signing keys used for remote attestation.

kllrnohj

The writing was on the wall for in-process sandboxing with Spectre, but that seems to have faded a bit. This just re-enforces that. Things like "safe in-process sandboxing with WASM" are just a fantasy, it can't be implemented.

lxgr

Safe in-process sandboxing is obviously possible and even trivial. It does get harder if you care about performance, though.

saagarjha

If the costs are high enough you’re basically reimplementing multi-process isolation from first principles.

kllrnohj

"trivial" how do you figure? Remember these exploits bypass your own code's conditionals over a shockingly far duration. Unless you just mean for incredibly restrictive usages such as eBPF?

possible absent any performance concerns at all, yeah sure

willtemperley

What I know as a developer is web security is really hard. Last week there was a DOM clobbering gadget deep in my TypeScript world and I really didn't have the energy to understand who wants to clobber my DOM and why they need a gadget. I want to build stuff and what worries me is this stuff is just simply not foreseeable.

null

[deleted]

r00fus

Do other browsers have process isolation for new tabs?

tptacek

Not necessarily for tabs on the same web site, but for different sites, yes. Hence "site isolation".

lxgr

To be fair, this is (relatively, compared to the age of the web) new behavior though.

Even Chrome, which pioneered the "tab per process" model, didn't isolate same browsing group context sites for a long time, and still shares processes on Android as a resource optimization: https://chromium.googlesource.com/chromium/src/+/main/docs/p...

Firefox only released their "Project Fission" in 2021, i.e. three years after Spectre.

cluckindan

In Safari settings under Advanced, it’s possible to enable ”verify window.open user gesture”. Does that help at all?

ddtaylor

AFAIK this only means the attacker has to dupe you into doing a UI event like scrolling or clicking or touching something on page. Very easy to do.

yard2010

Safari is Internet Explorer of the 20'

jabwd

This is a flawed comparison in many ways. As you might not understand IE was problematic because of its massive install base and everyone only, and only writing their websites for chrome oh wait. Typo'd there, meant IE.

tptacek

Cool detail, in the section where they reverse-engineer the presence of an LVP on the M3:

    Remarkably, while all other load widths activate the LVP
    on any constant value fitting that width, we observe that acti-
    vation on 8-byte wide loads occurs only when the load value
    is zero. We conjecture that this may be a countermeasure for
    memory safety such that the LVP will not learn values of
    pointers. That is, with the M3 being a 64-bit CPU, pointers
    are 8 bytes wide. Furthermore, on 64-bit macOS executables,
    any virtual address below 0x100, 000, 000 is invalid.

saagarjha

Grouping a hex address by threes is crazy

hmry

3 hex digits = 12 bits = 4096 entries, the size of each address translation table on ARM. So it does make some (twisted) sort of sense. Assuming you're using 4k page size

anyfoo

Apple devices use 16kB pages.

saagarjha

macOS/iOS don’t

ijustlovemath

wait, so does this mean that if an exploit tries to use a 32 bit address it's immediately shut down?

anyfoo

There are usually no valid 32 bit addresses, i.e. the first 4GB are not mapped.

avianlyric

That might be their point. As the OP quoted

> any virtual address below 0x100, 000, 000 is invalid.

That kinda suggests that all 32bit addresses are inherently invalid on 64bit MacOS

bawolff

Hmm, one part i found interesting

> In order to make cache hits distinguishable from misses in Safari, we reference the NOT gate-based cache amplifica- tion primitive from [29, Section 5], adjusting the speculation parameters for the M2 CPU. We run the amplifier 500 times when the target address is cached and 500 more times when it is evicted, in native and WebAssembly implementations. Table 3 summarizes the timing distributions, with units in ms. We observe that they are clearly separable even in a web environment, allowing us to distinguish cache hits from misses with WebKit’s 1 ms timer.

So i guess all the hub hub around disabling fine resolution timers and SharedArrayBuffer was for naught.

kevingadd

It delayed viable attacks by a few years, maybe?

It doesn't hurt that setbacks for web app development coincidentally send developers into the open arms of Google and Apple's stores that collect a 30% cut of all revenue, so there was a good incentive to do it even if it didn't protect anyone.

bawolff

> It doesn't hurt that setbacks for web app development coincidentally send developers into the open arms of Google and Apple's stores that collect a 30% cut of all revenue, so there was a good incentive to do it even if it didn't protect anyone.

That seems like a bit of a reach. Its an obscure feature that is rarely useful, and when it is all you have to do is send the right http header (if using chrome) and you get it back.

kevingadd

Multithreading may be an obscure feature to you but runtime developers get requests for it all the time. SAB becoming widely available was definitely delayed.

saagarjha

> On the other hand, although Chrome is equipped with Site Isolation, we demonstrate that it is not a perfect mitigation. We show the real-world existence of corner cases, where two subdomains of the same site can be merged into one process, again leading to LAP- and LVP-based attacks.

Did anyone spot where this is mentioned?

Edit: it doesn’t seem like they have a general attack. Rather, it’s that some sites are not in the public suffix list.

Edit 2: It’s also interesting that they found that iPhone 13 and iPhone 13 mini (which have the same processor and came out at the same time) differ in LAP in that they observed only the latter as having it. Very curious…

hashstring

Right, and “where two subdomains of the same site can be merged into one process” is normal right, given Site Isolation ≠ Origin Isolation.

A PSL flaw is important, but also a low-cost fix.

Thanks for pointing this out.

yoshicoder

Funny that I am seeing this now, because last Fall I had Daniel Genkin as my Intro to Cyber Security Professor (co-author of this result). Interesting class, but I remember him mentioning that they were working on a speculative attack for Apple CPUs after seeing the results of spectre and meltdown on Intel CPUs. I remember how he seemed almost paranoid about security, and I suppose I see why now (security is almost never guaranteed).

Especially now that I have just bought an M4 mac

kd913

Am curious if the problem impacts m4 given it came out after this was released and disclosed.

That and it moved to Arm’s 9.2 instructions.

jabwd

Keep in mind that it takes at least 3 months to produce an M4, and the design has been finalized long before that. So most likely yes

saagarjha

Yes.

tptacek

Aside: I feel like RUB has become kind of a global center for this kind of high-end offensive security work. Was I just not paying enough attention 10 years ago or is this a new-ish thing?

bobnamob

Idk if it's RUB or Yuval, he was credited on spectre and meltdown as well (if I recall correctly), but he was at data61 or Uni adelaide at the time

pbsd

It goes way back; check the work of the likes of Thorsten Holz or Christof Paar. TU Graz is another one.

What is RUB?

dooglius

Ruhr University Bochum, the third author's University

pulvinar

Ruhr-University Bochum, in Germany

stoneforger

https://www.ruhr-uni-bochum.de/de

null

[deleted]

r9295

They've also consistently put out some of the best fuzzing research

tptacek

Their offensive crypto work is also on point.

daneel_w

Apple released minor-version updates to both macOS and iOS the past few days, both containing several security fixes. Has anyone been able to confirm if they address these exploits?

layer8

They haven’t yet. From https://www.bleepingcomputer.com/news/security/new-apple-cpu...:

Apple acknowledged the shared proof-of-concept and stated it plans to address the issues. However, at the time of writing, the flaws remain unmitigated.

"We want to thank the researchers for their collaboration as this proof of concept advances our understanding of these types of threats," Apple told BleepingComputer.

"Based on our analysis, we do not believe this issue poses an immediate risk to our users."

trompetenaccoun

It's crazy that they were informed about this months ago and still have not fixed it yet. They're going to have to now that it's public but why would that pressure even be needed. I naively assumed if Apple still gets one thing right it's security updates. This is disappointing and concerning.

remram

Have you considered that it might be difficult?

anotherhue

For any yung'uns seeing this for the first time, the spectre and meltdown attacks (and accompanying papers) are worth reading.

https://spectreattack.com/

umvi

Is it bad that I disable spectre mitigations on all my PCs to get a free double-digit-% performance boost?

thijsr

It depends on your threat model. If you don't run any untrusted code on your hardware (including Javascript), you can safely disable the mitigations. If you do run untrusted code, keep them enabled.

positr0n

What is the threat model if I run lots of untrusted JavaScript, but I only have a small amount of memory in other processes worth reading and I would notice sustained high CPU usage?

Is there an example in the wild of a spectre exploit stealing my gmail cookie and doing something with it? (Would be difficult since it's tied to other fingerprints like my IP)

Or stealing credit card numbers when they're in memory after I place an online order?

coldpie

In the context of a regular end-user desktop machine, this seems overly paranoid to me. The odds of encountering a real, JS-based spectre attack in the wild are basically zero (has anyone ever seen a browser-based Spectre attack outside of a research context? even once?), and the odds of it then being able to retrieve actual sensitive data are also basically zero. That's two astonishingly tiny numbers multiplied together. The threat just isn't there.

For regular end-user desktop machines, the mitigations only decrease performance for no real benefit. Spectre is a highly targeted attack, it's not something you can just point at any random machine to retrieve all their bank passwords or whatever.

daneel_w

You're not getting a boost, you're avoiding a penalty. In some (but not all) cases you can avoid the penalty and the exploits by disabling SMT. Remember, SMT isn't twice as many cores, just twice as many half-cores. You'll be fine.

thijsr

Disabling SMT alone isn’t enough to mitigate CPU vulnerabilities. For full protection against issues like L1TF or MDS, you must both enable the relevant mitigations and disable SMT. Mitigations defend against attacks where an attacker executes on the same core after the victim, while disabling SMT protects against scenarios where the attacker runs concurrently with the victim.

umanwizard

In my experience SMT is still faster for most workloads even with the mitigations.

viraptor

In practice probably not, as long as general population keeps it enabled. I mean, looking at effort required, it's not worth spending time exploiting spectre these days, because virtually everyone is protected. If you're not likely to be directly targeted, "herd immunity" will work.

iforgot22

If just visiting a webpage with some JS will let them do ACE on even 0.1% of visitors, hackers are probably still motivated enough to try it. But I vaguely remember these kinds of vulns can be patched in-browser for a perf hit instead of taking the hit system-wide, which sounds like an ok compromise.

Edit: Arbitrary memory access, not ACE

jorvi

You shouldn't disable Spectre mitigations, but Retbleed and Downfall (Intel) are much more of a "lab" exploit, and the fall-out for Retbleed is much more severe on cloud boxes than your personal PC. Easy 20-40% performance uplift on AMD Zen1-Zen2 and Intel 6th-11th gen.

baq

Only if you don't care if baddies see you go fast

kllrnohj

What are you doing where you see anything remotely close to double-digit-% gains from disabling spectre mitigations?

AHTERIX5000

If mitigations include disabling SMT and the workload is compiling code, then the difference is easily in double digits.

umvi

my specific use case where I see significant performance improvement is image segmentation pipelines (which involve opencv-style image processing and AI inference). YMMV depending on your CPU I suppose.

iforgot22

Video editing maybe? Which is not going to involve running untrusted code.

basementcat

If your machine is air gapped and/or not running random downloaded code, I think it is a possible reasonable option.

autoexec

It's a tragedy that so many websites insist on having the ability to run random downloaded code on our systems to do basic things like displaying simple text and images. Things browsers are capable of with nothing but HTML. Google refuses to even show search results, a bunch of literal hyperlinks, without javascript being enabled.

paulddraper

> double-digit-%

In the early days there was a ~10% hit, but that's changed a lot since then.

umvi

It depends on the CPU, think. The most dramatic improvement I've seen is 20-30%+ improvements in python run times for numpy/pytorch heavy workloads on c2-standard-16 VMs in GCP with spectre mitigations disabled

LPisGood

I’d highly recommend reading Flush+Reload first since the cache side channel is key to any of these miceoarchitectural attacks.

mettamage

As someone who followed a course on all of this, this is indeed how we started out.

1. Read Flush + Reload

2. Then reproduce it in C

3. Then read Meltdown

4. Reproduce it in C

5. Read Spectre

6. Reproduce it in C

After that we had to implement a VUSEC paper. I chose GLitch [1].

[1] https://www.vusec.net/projects/glitch/

sabas123

Keep in mind that getting meltdown to work might be very difficult depending on your setup. I wouldn't have been able to at least when starting out my teacher didn't provide us with targetable hardware.

A spectre (particularly RSB-based ones) are nice to start out with imo.

ajross

Hm... as I read it this is much worse. Spectre/Meltdown were data isolation vulnerabilities. You could exploit side channel (mostly timing) information to intuit state about memory across a protection boundary. Basically you can prime the CPU state to allow you to tell which of multiple code paths the kernel/hypervisor/whatever took, and then go from there to reading arbitrary data. Which is bad, obviously.

Here, they claim to have a remote exploit vulnerability. It's not that Apple is leaking data, it's that the CPUs have a bug where they appear to be actually executing code based on incorrectly-loaded ("predicted") data.

Though the details are, as is usually the case, thin. I await further analysis.

midtake

A browser-based attack, in theory, could have happened with Spectre/Meltdown as well. I seem to recall a PoC for Spectre in the browser, actually. I believe it's also a reason that microsecond precision in the browser was made a bit more opaque since that era.

mettamage

GLitch was a Rowhammer browser based attack [1]. It's not Spectre/Meltdown but still, for a while people thought it couldn't be done.

[1] https://www.vusec.net/projects/glitch/

saagarjha

They’re speculatively executing code. It’s not traditional code execution. (You can, of course, read the papers for full details.)

omcnoe

It's not remote code execution, it's the same flavor of "out of bounds read through speculation" as previous vulnerabilities. It's terrifying because they have a working proof of concept from untrusted JS in Safari, but there have been speculative execution in browser JS engines before now also.

ajross

The language seems to argue otherwise: SLAP "allows the adversary to jump the LAP to the target webpage's string and trick the CPU into operating on it" and FLOP "allows us to run a function with the wrong arguments". That's absolutely not mere data exfiltration.

Now, maybe this is because of a trampoline based on pre-existing Safari bugs and not the CPU misfeature itself. Again, the details are slim.

But "the same flavor of vulerability" seems to be a mischaracterization.

twoodfin

This introduced me to the idea of load value predictors. Is Apple the only chip designer using these in commercially released microarchitecture?

adrian_b

In many CPU ISAs, load value predictors are unlikely to be useful, because they cannot guess the value that will be loaded with an acceptable probability.

The ARM ISA and also other ISAs with fixed-length instruction encoding are an exception. Because they have a fixed instruction length, typically of 32 bits, most constants cannot be embedded in the instruction encoding.

As a workaround, when programming for such ISAs, the constants are stored in constant pools that are close to the code for the function that will use them, and the load instructions load the constants using program-counter-relative addressing.

Frequently such constants must be reloaded from the constant pool, which allows the load value predictor to predict the value based on previous loads from the same relative address.

In contrast with the Apple ARM CPUs, for x86-64 CPUs it is very unlikely that a load value predictor can be worthwhile, because the constants are immediate values that are loaded directly into registers or are directly used as operands. There is no need for constants stored outside the function code, which may be reloaded multiple times, enabling prediction.

All fast CPUs can forward the stored data from the store buffer to subsequent loads from the same address, instead of waiting for the store to be completed in the external memory. This is not load value prediction.

eigenform

> for x86-64 CPUs it is very unlikely that a load value predictor can be worthwhile

I think you're making a good point about immediate encodings probably making ARM code more amenable to LVP, but I'm not sure I totally buy this statement.

If you take some random x86 program, chances are there are still many loads that are very very predictable. There's a very recent ISCA'24 paper[^1] about this (which also happens to be half-attributed to authors from Intel PARL!):

> [...] we first study the static load instructions that repeatedly fetch the same value from the same load address across the entire workload trace. We call such a load global-stable.

> [..] We make two key observations. First, 34.2% of all dynamic loads are global-stable. Second, the fraction of global-stable loads are much higher in Client, Enterprise, and Server work-loads as compared to SPEC CPU 2017 workloads.

[^1]: https://arxiv.org/pdf/2406.18786

adrian_b

Unfortunately what you say is true for many legacy programs, but it is a consequence of the programs not being well structured by the programmer, or not being well optimized by the compiler, or due to a defect of the ISA, other than the lack of big immediate constants.

Some of the global-stable values are reloaded because the ISA does not provide enough explicitly-addressable registers, despite the fact that a modern CPU core may have 10 times to 20 times more available registers, which could be used to store the global-stable values.

This is one of the reasons why Intel wants to double the number of general-purpose directly addressable registers from 16 to 32 in the future Diamond Rapids CPU (the APX ISA extension).

In other cases the code is not well structured and it tests repeatedly some configuration options, which could be avoided by a proper partitioning of the code paths, where slow tests would be avoided and the execution time would be reduced, even at the price of a slight code size expansion (similarly to the effect of function inlining or loop unrolling).

Sometimes the use of such global-stable values could have been avoided even by moving at compile time the evaluation of some expressions, possibly combined with dynamic loading of some executable objects that had been compiled for different configurations.

So I have seen many cases of such global-stable values being used, even for CPU ISAs that do not force their use, but almost none of them were justified. Improving such programs at programming time or at compile time would have resulted in greater performance improvements, which would have been obtained with less energy consumption, than implementing a load-value predictor in the CPU.

twoodfin

Thank you, fantastic answer.

I do wonder if there are other common code patterns that a practical LVP could exploit. One that comes to mind immediately are effectively constants at one remove: Think processing a large array of structs with long runs of identical values for some little-used parameter field. Or large bitmasks that are nearly all 0xFF or 0x00.

eigenform

Probably not, but I don't think anyone has talked about it explicitly.

Otherwise, there are known examples of related-but-less-aggressive optimizations for resolving loads early. I'm pretty sure both AMD[^1] and Intel[^2] have had predictive store-to-load forwarding.

edit: Just noticed the FLOP paper also has a nice footnote about distinguishing LVP from forwarding during testing (ie. you want to drain your store queue)!

[^1]: https://www.amd.com/content/dam/amd/en/documents/processor-t...

[^2]: https://www.intel.com/content/www/us/en/developer/articles/t...

bjackman

> I'm pretty sure both AMD[^1] and Intel[^2] have had predictive store-to-load forwarding.

IIRC this was how Spectre Variant 4 worked.

adgjlsfhk1

from doing some work on GC a couple years ago, at that time apple was the only one with it. The performance is awesome, it makes graph traversal ~2x faster.

BobbyTables2

There are two times in a man's life when he should not speculate: when he can't afford it, and when he can.

Mark Twain

gpderetta

CPUs on the other hand do nothing but speculate.

phendrenad2

Seems like speculative execution is just fundamentally insecure. With SPECTRE/MELTDOWN mitigations, doesn't CPU performance drop below the same CPU performance with no branch prediction at all? Should we move back to CISC? Or maybe VLIW?

andrewia

I don't think so; speculative execution is the cornerstone of modern CPU performance. Even 15-year-old 32-bit ARM CPUs do it. The only phone/PC-grade processors without it are the first generation of Intel Atom, and I recall that early Atom processors sacrificed a ton of performance to keep power consumption low. I doubt this will change since mitigations are "good enough" to patch over major issues.

happosai

There is extremely popular Cortex-A53 which is in-order core.

MindSpunk

Yes and it's very slow as a result. In-order cores without speculative execution can't be fast. Not unless you have no memory and only operate out of something equivalent to L1 cache.

Memory is slow. Insanely slow (compared to the CPU). You can process stupid fast if your entire working set can fit in a 2KB L1 cache, but the second you touch memory you're hosed. You can't hide memory latency without out-of-order execution and/or SMT. You fundamentally need to be parallel to hide latency. CPUs do it with out-of-order and speculative execution. GPUs do it by being stupidly parallel and running something like 32-64 way SMT (huge simplification). Many high-performance CPUs do all of these things.

Instruction level parallelism is simply not optional with the DRAM latency we have.

dailykoder

Maybe the boomers were right and we made computers way too complex? This might be a bit of a hyperbole, but seems like there will always be a security hole (even if mostly hard to exploit). But I also guess we can't get much faster without it either. So maybe we should reduce complexity. Atleast for safety critical systems.

MagicMoonlight

Now wait until the zoomers come along and take the lead on these products. They grew up with iPads and no file system. It’s going to be chaos!

rasz

>speculative execution is just fundamentally insecure

I dont think its inevitable, might be caused by greed. You could have small separate sections of cache (or additional tag) dedicated to per thread speculation, but no designer is willing to sacrifice real estate to something that will be thrown away the instantly.

umanwizard

> SPECTRE/MELTDOWN mitigations, doesn't CPU performance drop below the same CPU performance with no branch prediction at all?

No. Processors with no branch prediction would be many times slower (up to dozens of times slower on workloads that don’t fit in cache)

api

I bet you could construct a hard proof that any kind of speculation is insecure in the sense that it cannot be proven secure.

If that's not true, then someone's going to figure out exactly how to set bounds on safe speculation and that will become part of future CPU designs.

HN

New speculative attacks on Apple CPUs

New speculative attacks on Apple CPUs