Evaluating the Infinity Cache in AMD Strix Halo

epistasis

Great article on performance. This video from a few weeks ago goes into chiplet design a bit more too:

joelthelion

I don't quite get it. What's so special about having 32MB of cache? Why is it called "infinity"?

noelwelsh

This article from the same site goes into the Infinity Cache design in a bit more detail: https://chipsandcheese.com/p/amds-cdna-3-compute-architectur...

The summary is that it's a cache attached to the memory controllers, rather than the CPUs, so it doesn't have to worry about cache coherency so much. This could be useful for shared memory parallelism.

joelthelion

Thank you!

phire

AMD named their memory fabric "infinity fabric" for marketing reasons. So when they developed their memory attached cache solution (which lives in the memory fabric, unlike a traditional cache), the obvious marketing name is "infinity cache"

The main advantage of a memory attached cache is that it's cheaper than a regular cache, and can even be put on a seperate die, allowing you to have much more of it.

AMDs previous memory fabric from the early 2000s was called "Hyper Transport", which has a confusing overlap with Intel's Hyper Threading, but I think AMD actually bet intel to the name by a few years.

typpilol

How's latency vs a traditional?

pixelpoet

What makes Intel's SMT implementation "hyper"? What makes Mario "Super"? It's just marketing.

arjvik

Hyperthreading is technically a level above superscalar?

themafia

> What makes Mario "Super"?

The Super Mushroom power-up.

andrewstuart

Despite this APU being deeply interesting to people who want to do local AI, anecdotally I hear that it’s hard to get models to run on it.

Why would AMD not have focused everything it possibly has on demonstrating and documenting and fixing and showing and smoothing the path for AI on their systems?

Why does AMD come across as so generally clueless when it comes to giving developers what they want, compared to Nvidia?

AMD should do whatever it takes to avoid these sort of situations:

https://youtu.be/cF4fx4T3Voc?si=wVmYmWVIya4DQ8Ut

lmm

Hardware companies are extremely bad at valuing software. The mystery isn't that AMD is bad at it, the mystery is that NVidia is good at it. They also have a probably 30-40 year head start. AMD is trying as much as they can, but changing culture takes time.

DeepYogurt

Intel and arm are also pretty good at it. amd feels like the outlier here

pella

"The AMD Ryzen™ AI Max+ processor is the first (and only) Windows AI PC processor capable of running large language models up to 235 Billion parameters in size. This includes support for popular models such as: Open AI's GPT-OSS 120B and Z.ai Org's GLM 4.5 Air. The large unified memory pool also allows models (up to 128 Billion parameters) to run at their maximum context length (which is a memory intensive feature) - enabling and empowering use cases involving tool-calling, MCP and agentic workflows - all available today. "

  GPT-OSS 120B MXFP4              : up to 44 tk/s
  GPT-OSS 20B MXFP4               : up to 62 tk/s
  Qwen3 235B A22B Thinking Q3 K L : up to 14 tk/s
  Qwen3 Coder 30B A3B Q4 K M      : up to 66 tk/s
  GLM 4.5 Air Q4 K M              : up to 16 tk/s

(performance tk/s ) : https://www.amd.com/en/blogs/2025/amd-ryzen-ai-max-personal-...

andrewstuart

I’m not sure why you are telling me this.

YuukiRey

It’s an example of AMD catering to the AI crowd to somewhat refute your claim that they are clueless.

Not exactly a gigantic mental leap.

aaryamanv

You can run ROCm and PyTorch natively for strix halo on both windows and linux. See https://rocm.docs.amd.com/en/docs-7.9.0/index.html

sidkshatriya

> Why does AMD come across as so generally clueless when it comes to giving developers what they want, compared to Nvidia?

I have some theories. Firstly, Nvidia was smart enough to have a unified compute GPU architecture across all its architectures -- consumer and commercial. AMD has this awkward split between CDNA and RDNA. So while AMD is scrambling to get CDNA competitive, RDNA is not getting as much attention as it should. I'm pretty sure its ROCm stack has all kinds of hacks trying to get things working across consumer Radeon devices (which internally are probably not well suited/tuned for compute anyways). AMD is hamstrung by its consumer hardware for now in the AI space.

Secondly, AMD is trying to be "compatible" to Nvidia (via HIP). Sadly this is the same thing that AMD did with Intel in the past. Being compatible is really a bad idea when the market leader (Nvidia) is not interested in standardising and actively pursues optimisations and extensions. AMD will always play catch up.

TL;DR AMD made some bad bets on what the hardware would look like in the future and never thought software was critical like nvidia.

AMD now realizes that software is critical and what future hardware should look like. However it is difficult to catch up with Nvidia, the most valuable company in the world with almost limitless resources to invest in further improving its hardware and software. Even while AMD improves, it will continue to look bad in comparison to Nvidia as state of art keeps getting pushed forward.

positron26

While Nvidia's strategic foresight explains why Nvidia is ahead, it doesn't quite capture why the challenge is not something that only AMD can or should tackle alone.

The 7,484+ companies who stand to benefit do not have a good way to split the bill and dogpile a problem that is nearly impossible to progress on without lots of partners adding their perspective via a breadth of use cases. This is why I'm building https://prizeforge.com.

Nvidia didn't do it alone. Industry should not expect or wait on AMD to do it alone. Waiting just means lighting money on fire right now. In return for support, industry can demand more open technology be used across AMD's stack, making overall competition better in response for making AMD competitive.

JonChesterfield

One issue is you need rocm 7 which only just came out.

Another is that people unsportingly write things in cuda.

It'll be a "just works" thing eventually, even if you need software from outside AMD to get it running well.

dontlaugh

It’s a gaming chip.

typpilol

Any idea what makes models hard to run on it?

Just general compatibility between Nvidia and AMD for stuff that was built for Nvidia originally?

Or do you mean something else?

cakealert

It's not the models, it's the tooling. Models are just weights and an architecture spec. The tooling is how to load and execute the model on hardware.

Some UX-oriented tooling has sort of solved this problem and will run on AMD: LM Studio

HN

Evaluating the Infinity Cache in AMD Strix Halo

Evaluating the Infinity Cache in AMD Strix Halo