Intel AVX10 Drops Optional 512-Bit: No AVX10 256-Bit Only E-Cores in the Future
22 comments
·March 19, 2025pornel
adrian_b
The CUDA approach is just a software abstraction layer. The hardware of the NVIDIA GPUs is no more similar to the CUDA model than AVX-512 (the NVIDIA GPUs use 1024-bit vectors instead of 512-bit vectors).
There exists also for Intel AVX/AVX-512 a compiler that implements the CUDA approach (Intel Implicit SPMD Program Compiler). Such compilers could be written for translating any programming language into AVX-512, while using the same concurrency model as CUDA.
Moreover, as a software model the "CUDA approach" is essentially the same as the OpenMP approach, except that the NVIDIA CUDA compilers are knowledgeable about the structure of the NVIDIA GPUs, so they are able to map automatically the concurrent threads specified by the programmer into GPU hardware cores, threads and SIMD lanes.
janwas
In addition to ISPC, it is possible to do this kind of vector-length abstraction at the library level, e.g. in our Highway library.
We routinely write code that works on 128-512 bit vectors. Some use cases are harder than others, e.g. transposing.
Wumpnot
Smart decision by Intel for once, they aren't the market leader anymore and need to follow AMD on this. If they had gone down the 256 bit max route I suspect it would have gotten very little traction
adrian_b
I assume that a decision as important as this must have been greenlighted by the new Intel CEO.
If that were true, it would be a good sign for his competence.
Remnant44
Well. That's a pleasant surprise - Intel making the right move here! There was so much unforced error around avx512 strategy.
According to the steam hardware survey for Feb 2025, AVX512 support is already up to around ~16%. That may not sound like a lot, but considering that it's basically all coming from Zen4&5 based CPUs..
Once Intel's next design also supports it, it's probably worth targeting for consumers in relevant markets.
karmakaze
That headline was hard to parse correctly.
> Intel has dropped the 256-bit-only approach and going for 512-bit everywhere. Thus it would seem to indicate that Intel E cores of the future will properly support AVX 512-bit operation!
> It looks like AMD's widespread support for AVX-512 since Zen 4 and the rather confusing AVX10 implementations previously pursued by Intel are now over. With updated GCC compiler patches posted today, that 256-bit mess proposed for future AVX10 versions is being removed.
colejohnson66
Instead of saying it drops the optional 512-bit portion, it should say that it's now mandatory. Remove the double-negative. A better title would then be: "Intel AVX10.2 makes 512-bit mandatory: No 256-bit only E cores".
throw0101d
Will we see a x86-64-v5 at some point?
* https://en.wikipedia.org/wiki/X86-64#Microarchitecture_level...
fweimer
This change puts x86-64-v4 on the roadmap again. It's unclear what happens after that. And before distributions can adopt x86-64-v4, AVX-512 support must be well-established in client CPUs, too.
Ellipsis753
Is this saying that intel will support _only_ 512 instructions? (And not 256).
Or that it'll support _both_ 256 and 512 instructions going forwards (and stop doing the nonsense where some cores support 512 and others don't?)
adrian_b
AVX10 will continue to support 512-bit instructions, 256-bit instructions, 128-bit instructions and scalar instructions (FP32 & FP64), exactly like the current AMD and Intel CPUs with AVX-512 support.
So none of the current instructions will be removed.
The previous plan of Intel was that in consumer CPUs the 512-bit instructions shall be removed, keeping only 256-bit instructions, 128-bit instructions and scalar instructions (FP32 & FP64).
Nevertheless, the most ancient versions of AVX-512 had only 512-bit instructions and scalar instructions.
The 256-bit instructions and 128-bit instructions have been added in Skylake Server, as a workaround for the bad power management of Intel at that time, which forced huge drops in clock frequency for long times when using wide instructions.
On modern CPUs there is no need to use 256-bit or 128-bit instructions. You gain nothing with them. AVX10 instructions have masks, so you can process any arbitrary length with a 512-bit instruction, in the case of loop prologues or epilogues.
The use of 512-bit instructions simplifies many optimized programs, because one instruction processes one cache line.
snvzz
Zen4 made it very difficult for Intel to make 512 a premium feature.
phkahler
Zen 5 beats Arrow Lake without AVX512. For workloads that use it you then get another huge performance jump with Zen 5, but Arrow Lake doesn't even have the instructions.
fweimer
One clear reason to use 128-bit instructions: naturally aligned 128-bit loads and stores are only atomic if encoded as EVEX.128 (or VEX.128 etc.).
The default auto-vectorization tuning for current Intel server CPUs using 256-bit registers, which is perhaps another counterexample.
janwas
The auto-vectorization (which I anyway would not rely on) default setting also sounds like a workaround for the SKX issue.
For atomic, I'm curious how you make use of that?
chad1n
It will support both, but considering the previous experiences with avx 512 on intel, I wouldn't that excited
01HNNWZ0MV43FF
I wonder how E-cores work. How does the OS decide what to schedule onto these cores? I've seen my new work laptop has some, but I don't know how they work.
ksec
Will Intel now work with AMD for a coherent and simplified baseline x86 instruction.
And then opens it up.
snvzz
What a shitshow.
x86-64v4 was defined around 2017 and requires AVX-512. Few Intel CPUs comply with it even today.
null
hhcgujh
[flagged]
That's great. Having a usable baseline is important to ship it in more than a handful of handpicked functions.
But the whole approach with fixed-length instructions seems terrible to me. It takes Intel a decade to add another batch of instructions for another width, and the existing applications don't benefit from the new instructions, even if they already process wide batches of data.
The CUDA approach is so much more appealing: here's my data, and you can process it in how many small or large units as you want.