Boosting Computational Fluid Dynamics Performance with AMD MI300X

45 comments

·January 15, 2025

gymbeaux

Now if only I could purchase one*

*I realize enterprise “GPUs” are difficult to purchase as an individual whether they’re AMD or Nvidia, but AMD might be well-served to mimic their RX 480 strategy (“make a competitive mid-range GPU, distribute it through several board partners, and sell it at rock-bottom prices to get it to as many customers as possible”). If there’s a compelling reason to use AMD hardware over Nvidia, the software support will come. As an extreme example, if one could purchase an MI300X for $50 a pop, surely FAANG and others would invest time and effort into the software side to leverage the cost savings vs Nvidia, who is unquestionably price-gouging due to their monopolistic hold on the enterprise GPU market.

mpreda

It seems AMD has to choose between monetizing the GPUs right now by selling them at the maximum effective price, or forgoing that gain and selling the GPUs at cost and increasing GPU market share in the future. The choice between short-term gain and long-term gain.

mpreda

And the splitting into CDNA and RDNA comes from the same direction: market segmentation, to allow much higher prices for the CDNA data-center GPUs, while keeping the gamer-focused RDNA GPUs affordable for mere mortals. Of coures this backfires by making the powerful GPUs not available for mostly anybody anymore to experiment on.

For example this blog post, about how great MI300X is. Really, what do I care -- I'm not a billionaire.

dragontamer

> And the splitting into CDNA and RDNA comes from the same direction: market segmentation

Not really.

Wave64 on CDNA is provably more throughput. But with most video game code written for NVidia's Wave32, RDNA being reworked to be more NVidia-like and Wave32 is how you reach better practical video game performance.

HPC will prefer the wider execute, 64-bit execution, and other benefits.

Video Gamers will prefer massive amounts of 32MB+ of "Infinity cache", which is used in practice for all kinds of screen-space calculations. But this would NEVER be used for fluid dynamics.

tormeh

They’re unifying the architectures. AMD will move to UDNA for both gaming and data center. The next graphics cards after RDNA4 will be UDNA. Makes sense given how ML-heavy graphics has become.

42lux

You mean hard in regards of pricing? Because otherwise it’s not really hard to buy enterprise gpus at all.

gymbeaux

Can you provide a link to where I can purchase the MI300X? The only way I seem to be able to buy one is to contact AMD Sales who will surely laugh at my request for “less than 1000 units.”

latchkey

Depends on what country you are in. These are technically export controlled.

latchkey

Disclosure, I have a business in this.

This is what I saw as well. As a developer, I wanted access to enterprise HPC compute, but I'm also not going to do a PhD just to play around with these things. So, I got funding, started a business and bought 8 of them as a PoC test. We got customers, we got more funding, got a real datacenter, we bought 128 more. Crawl, walk, run.

You can now rent them by the minute from us for a few bucks an hour. Currently limited to docker containers for individual GPUs, but you can get a full bare metal 8x box too (with BIOS too!). Support for VM's is coming. If you want multiple boxes, we have the full 8x400G NICs too. The boxes are fully loaded with tons of enterprise NVMe, RAM and top core/clock Intel CPUs (not AMD cause Dell didn't have that as a solution).

Our model is to follow AMD's roadmap and buy/release their products as they come. We're currently debating the 325x and looking forward to / planning for the 355x.

Despite your desire, it will be a long time before there is a consumer version of these things. Especially as they move to more and more complex deployments. Look at the NV72 and the requirements around that... we can all guess where AMD is going. DC rails in the racks, DLC cooling, massive power requirements. It is only getting more and more capex/opex intensive.

Let's also not forget that AMD is really just a hardware manufacturer. When you buy a RX480 (I had 130,000 of these previously), it was from an OEM, like Sapphire, that could handle all the end user support.

This is why the whole NeoCloud industry has sprung up. Large clouds can only handle this pace by selling thousands at a time in multi-year contracts. We are taking the long tail and built a business around that. Short of doing everything we are doing yourself (which trust me, is not easy), your best bet is to work with companies like mine to get you access to this gear.

You can now rent them by the minute from us for a few bucks an hour. Currently limited to docker containers for individual GPUs, but you can get a full bare metal 8x box too. Support for VM's is coming. If you want multiple boxes, we have the full 8x400G NICs too. The boxes are fully loaded with tons of enterprise NVMe, RAM and top core/clock Intel CPUs (not AMD cause Dell didn't have that as a solution).

gymbeaux

I recognize that selling enterprise hardware one or two units at a time to people like me is not cost effective and is why AMD isn’t doing it, but I don’t think there’s anything stopping them from relying on distribution partners like Sapphire, Gigabyte and XFX to handle everything but the GPU die. Demand would be low relative to consumer stuff, and after cutting these partners in they’re probably selling at cost or on thin margin, but again, if they want to carve out market share it’s going to be VERY slow-going if they continue with this “charge as much money as possible and only sell to datacenters” approach. Nvidia can do that because they’ve cornered the market.

Meanwhile I can’t even find an MI300X on eBay. I can at least poach enterprise Nvidia GPUs like the A100 on eBay. This tells me AMD’s shipping far fewer units and therefore enterprise GPUs aren’t doing much for their balance sheet (though I’d have to look at their quarterly and annual reports to know for certain). To me this strengthens the case for selling to individuals/startups, and at prices that offset the risk of picking AMD over Nvidia and potentially running into software shortcomings.

I’m set with two RTX 3090s at the moment, but it’s very neat that you’ve been able to bootstrap essentially a cloud service provider in the age of AWS, Azure and GCP (and DO and Vultr and Linode et al).

latchkey

> I don’t think there’s anything stopping

There absolutely is. The current form factor is not standard PCIe. It is a OAM/UBB board that is custom designed by AMD to support Infinity Link. It only comes in an 8x configuration. Now, you're asking for a totally different design and that requires a huge investment that would take away from their existing focus on enterprise.

> Meanwhile I can’t even find an MI300X on eBay.

https://www.ebay.com/itm/305850340813

LtdJorge

You're doing good work on your subreddit!

latchkey

Thank you! https://www.reddit.com/r/AMD_MI300/

latchkey

(sorry that last paragraph got duplicated somehow)

LtdJorge

I'm pretty sure they're selling at capacity

ieidkeheb

Nope ... IMO AMD cannot compete on software with nvidia. I bought an rx580 to test rocm/tensorflow ... Only for it to be a buggy mess that was discontinued for support in the next version.

AMD needs to invest a Fsck load of money in software... Until then they can have the greatest compute cards in the world.but it will.mean nothing

bavell

Isn't rx580 like 4 gens behind now? It was released in 2017. Probably not the right card for anything AI.

imtringued

Why would anyone buy a new AMD card if the expectation is that the new one won't work either after support runs out? I use AMD at home and Nvidia A100s at work. There is no need to upgrade an old GPU to a new GPU if all it does is act as a fancy iGPU.

ieidkeheb

Yes this was years ago... Pre covid ... I thought the fact I used tensoflow as the target library I thought would flag that for others

thomasfedb

Our team has access to multiple systems that either have MI250Xs or H100s. Getting stuff to work with AMD/ROCm is substantially more effort than the NVIDIA/CUDA experience.

Some of this is lack of groundwork/engineering by packages or system administrators, but it seems a decent amount is the relative lack of effort by AMD to make things work well OOTB.

latchkey

Installing ROCm is easy and well documented [0].

Anush (AMD VP of AI software) has had a fire lit under his butt after the recent SemiAnalysis article [1] and is actively taking feedback on improving the experience. If you have specific things you'd like to see, I'm more than happy to forward them onto him (contact in my profile).

[0] https://rocm.docs.amd.com/en/latest/

[1] https://semianalysis.com/2024/12/22/mi300x-vs-h100-vs-h200-b...

BoingBoomTschak

The real question is: is this from lack of effort or simply from NVidia's headstart? Will it get better?

noch

> Will it get better?

It won't, not in any way that will make AMD approximately competitive with Nvidia.

AMD, unlike Nvidia, seems unable to prioritize developers. Here's a summary of last week's charlie-fox when the TinyGrad team attempted to get 2 MI300s for on-premises testing and was rebuffed by an AMD representative. https://x.com/dehypokriet/status/1879974587082912235

musicale

Both, really.

It will get better, but Nvidia seems to be creating CUDA libraries for all kinds of applications, so the moat is constantly widening/deepening.

KeplerBoy

Those are some sketchy bar charts. If the bar doesn't start at y=0 and you don't even have axis ticks, it's meaningless and just a colorful blob on your slide.

semi-extrinsic

This looks like a really stupid benchmark, no? 4 million cells in a standard aerodynamic simulation isn't enough work to even saturate the host CPUs on these systems, never mind the GPUs which have far "worse" roofline models from a CFD algorithm point of view.

I would not be surprised at all if these benchmarks ran faster if you removed the GPUs completely.

shihab

That's the smallest of 4 experiments. It goes upto 140 million cells, where MI300X retains similar performance advantage of around 10% over Nvidia's H100.

semi-extrinsic

Yeah, but the lack of comparison to a CPU-only system is suspicious. (As is the exaggerated plot without ticks and labels on the y-axis.)

I've never seen a non-reactive incompressible flow simulation get substantial speedup on GPUs. There are well understood fundamental reasons why this is the case.

shihab

I helped develop a hydro solver coupled with radiation at Los Alamos this summer. We observed from 7x upto 15x performance improvement on a single GPU compared to 64-core intel CPU. [1]

Now granted, the flops to byte ratio for this program might be better than an avg fluid simulator. Also, our performance tanked when we moved to multi-node system. But I am aware of underlying reasons behind the scalibility issues and they don't feel like problems that can't be overcome.

[1] https://github.com/lanl/HARD/tree/main

JorgeGT

It's tangential, but this is the first time I've seen Fluent installed by simply decompressing a tar, instead of executing their big installer.

lukasb

warning - no pretty videos in this post

HN

Boosting Computational Fluid Dynamics Performance with AMD MI300X

Boosting Computational Fluid Dynamics Performance with AMD MI300X