Evaluating the Infinity Cache in AMD Strix Halo
23 comments
·October 22, 2025joelthelion
I don't quite get it. What's so special about having 32MB of cache? Why is it called "infinity"?
noelwelsh
This article from the same site goes into the Infinity Cache design in a bit more detail: https://chipsandcheese.com/p/amds-cdna-3-compute-architectur...
The summary is that it's a cache attached to the memory controllers, rather than the CPUs, so it doesn't have to worry about cache coherency so much. This could be useful for shared memory parallelism.
joelthelion
Thank you!
phire
AMD named their memory fabric "infinity fabric" for marketing reasons. So when they developed their memory attached cache solution (which lives in the memory fabric, unlike a traditional cache), the obvious marketing name is "infinity cache"
The main advantage of a memory attached cache is that it's cheaper than a regular cache, and can even be put on a seperate die, allowing you to have much more of it.
AMDs previous memory fabric from the early 2000s was called "Hyper Transport", which has a confusing overlap with Intel's Hyper Threading, but I think AMD actually bet intel to the name by a few years.
typpilol
How's latency vs a traditional?
andrewstuart
Despite this APU being deeply interesting to people who want to do local AI, anecdotally I hear that it’s hard to get models to run on it.
Why would AMD not have focused everything it possibly has on demonstrating and documenting and fixing and showing and smoothing the path for AI on their systems?
Why does AMD come across as so generally clueless when it comes to giving developers what they want, compared to Nvidia?
AMD should do whatever it takes to avoid these sort of situations:
lmm
Hardware companies are extremely bad at valuing software. The mystery isn't that AMD is bad at it, the mystery is that NVidia is good at it. They also have a probably 30-40 year head start. AMD is trying as much as they can, but changing culture takes time.
DeepYogurt
Intel and arm are also pretty good at it. amd feels like the outlier here
pella
"The AMD Ryzen™ AI Max+ processor is the first (and only) Windows AI PC processor capable of running large language models up to 235 Billion parameters in size. This includes support for popular models such as: Open AI's GPT-OSS 120B and Z.ai Org's GLM 4.5 Air. The large unified memory pool also allows models (up to 128 Billion parameters) to run at their maximum context length (which is a memory intensive feature) - enabling and empowering use cases involving tool-calling, MCP and agentic workflows - all available today. "
GPT-OSS 120B MXFP4 : up to 44 tk/s
GPT-OSS 20B MXFP4 : up to 62 tk/s
Qwen3 235B A22B Thinking Q3 K L : up to 14 tk/s
Qwen3 Coder 30B A3B Q4 K M : up to 66 tk/s
GLM 4.5 Air Q4 K M : up to 16 tk/s
(performance tk/s ) : https://www.amd.com/en/blogs/2025/amd-ryzen-ai-max-personal-...andrewstuart
I’m not sure why you are telling me this.
YuukiRey
It’s an example of AMD catering to the AI crowd to somewhat refute your claim that they are clueless.
Not exactly a gigantic mental leap.
aaryamanv
You can run ROCm and PyTorch natively for strix halo on both windows and linux. See https://rocm.docs.amd.com/en/docs-7.9.0/index.html
sidkshatriya
> Why does AMD come across as so generally clueless when it comes to giving developers what they want, compared to Nvidia?
I have some theories. Firstly, Nvidia was smart enough to have a unified compute GPU architecture across all its architectures -- consumer and commercial. AMD has this awkward split between CDNA and RDNA. So while AMD is scrambling to get CDNA competitive, RDNA is not getting as much attention as it should. I'm pretty sure its ROCm stack has all kinds of hacks trying to get things working across consumer Radeon devices (which internally are probably not well suited/tuned for compute anyways). AMD is hamstrung by its consumer hardware for now in the AI space.
Secondly, AMD is trying to be "compatible" to Nvidia (via HIP). Sadly this is the same thing that AMD did with Intel in the past. Being compatible is really a bad idea when the market leader (Nvidia) is not interested in standardising and actively pursues optimisations and extensions. AMD will always play catch up.
TL;DR AMD made some bad bets on what the hardware would look like in the future and never thought software was critical like nvidia.
AMD now realizes that software is critical and what future hardware should look like. However it is difficult to catch up with Nvidia, the most valuable company in the world with almost limitless resources to invest in further improving its hardware and software. Even while AMD improves, it will continue to look bad in comparison to Nvidia as state of art keeps getting pushed forward.
positron26
While Nvidia's strategic foresight explains why Nvidia is ahead, it doesn't quite capture why the challenge is not something that only AMD can or should tackle alone.
The 7,484+ companies who stand to benefit do not have a good way to split the bill and dogpile a problem that is nearly impossible to progress on without lots of partners adding their perspective via a breadth of use cases. This is why I'm building https://prizeforge.com.
Nvidia didn't do it alone. Industry should not expect or wait on AMD to do it alone. Waiting just means lighting money on fire right now. In return for support, industry can demand more open technology be used across AMD's stack, making overall competition better in response for making AMD competitive.
JonChesterfield
One issue is you need rocm 7 which only just came out.
Another is that people unsportingly write things in cuda.
It'll be a "just works" thing eventually, even if you need software from outside AMD to get it running well.
dontlaugh
It’s a gaming chip.
typpilol
Any idea what makes models hard to run on it?
Just general compatibility between Nvidia and AMD for stuff that was built for Nvidia originally?
Or do you mean something else?
cakealert
It's not the models, it's the tooling. Models are just weights and an architecture spec. The tooling is how to load and execute the model on hardware.
Some UX-oriented tooling has sort of solved this problem and will run on AMD: LM Studio
Great article on performance. This video from a few weeks ago goes into chiplet design a bit more too:
https://youtu.be/maH6KZ0YkXU