RDNA 4's “Out-of-Order” Memory Accesses

jauntywundrkind

I've been super curious to see what was at stake here! This sounds better than I'd dared to hope for.

I kind of thought this was just gonna be some kind of deferred texture loading thing, help with streaming assets.

If it actually allows inter-warp sequencing, it sounds like it might possibly solve the chief complains supreme GUI master Raph Levien recently had in I want a good parallel computer, which so that even though we can dynamically add shaders & construct a dynamic workgraph (largely thanks to VK_AMDX_shader_enqueue?), there isn't any sequencing/fencing/barrier-ing between the sections. https://raphlinus.github.io/gpu/2025/03/21/good-parallel-com... https://news.ycombinator.com/item?id=43440174

Not applicable to GPUs, but since I ran into it recently, it's interesting to see how io_uring handles sequenced submissions. Here's Lord of io_uring's write-up, https://unixism.net/loti/tutorial/link_liburing.html#link-li...

Edit: having read the article more fully, I'm not sure this is about waves depending on each other. Maybe more about them trying to access memory. Apologies. Hopefully someday!

null

[deleted]

Terr_

At first glance at the title, I thought it was going to be about some twist on DNA 3' and DNA 5' reading frames.

https://en.wikipedia.org/wiki/Reading_frame

pyinstallwoes

What’s interesting about that glass you?

IshKebab

Presumably this didn't matter hugely because the memory access patterns for each wave are going to be extremely similar anyway?

Ah yeah he says that at the end. Doesn't really matter for rasterisation but might make more of a difference for ray tracing.

shmerl

Does AMD have its own flavor of GPU assembly and how is it called?

dragontamer

Yes and it's slightly different per architecture. Mostly new instructions (like the discussed one in this article).

Just RDNA4 ISA and you'll find it:https://www.amd.com/content/dam/amd/en/documents/radeon-tech...

Terrascale from 2008 was very different. Ignore it.

GCN is mostly the same as RDNA and GCN is practically identical to CDNA. So you can go back to older guides as far back as GCN1 (like early 2010s era). The only fundamental difference is RDNA is SIMD32 while GCN/CDNA is SIMD64

--------

NVidia has an intermediate assembly language called PTX. NVidias true assembly language is undocumented (but not secret, not just intended for general purpose coding). Search on NVidias PTX manual and you'll see ...

GZGavinZhao

Slightly tangent, but AMD is also working on amdgcnspirv (i.e. AMD-flavored SPIR-V) that'll hopefully result in a similar user experience like PTX [1].

[1]: https://github.com/ROCm/ROCm/issues/3985#issuecomment-254616...

shmerl

Mesa uses NIR as intermediate representation for its drivers. Is that comparable?

shmerl

Interesting, thanks!

Looking forward to aco compiler using new features of RDNA4 to improve ray tracing performance with radv.

null

[deleted]

HN

RDNA 4's “Out-of-Order” Memory Accesses

RDNA 4's “Out-of-Order” Memory Accesses