The Framework Desktop is a beast

311 comments

·August 8, 2025

lucb1e

> The Framework Desktop with 64GB RAM + 2TB NVMe is $1,876. To get a Mac Studio with similar specs [...] you'll literally spend nearly twice as much [...] The Framework Desktop is simply a great deal.

Wow, someone managed to beat Apple on price??

I don't know that it logically follows that anything is a great deal when it undercuts Apple. Half sounds about right -- I thought Apple was a bit more competitive these days than ×2 actually, but apparently not, also considering that Framework comes with a (totally fair) niche-vendor premium

scosman

He's comparing to a studio when he should compare to the mini for this performance. They are almost the same price at a 64gb RAM + 500gb storage config (CAD).

- Framework, Max+ 64GB: $2,861.16

- Apple Mini M4 Pro, 64GB: $2,899.00

Apple does charge way too much for integrated storage, but Apple is only a 25% premium at 2TB not double (if you compare to the mini instead of the studio). Plug in a NVMe SSD if you're building an Apple desktop.

dijit

Not to defend Apple here, but it's also a bit apples to oranges (heh) because the power consumption is not easily comparable.

I would hazard a guess and say: at that spec, if you're looking at 1Y TCO, the Apple could easily be cost competitive performance per dollar.

Since they're in spitting distance of each other, just get the one you're most comfortable with. Personally I prefer Linux and I'm really happy that non-Apple machines are starting to get closer to the same efficiency and performance!

ramesh31

You have to account for resale when it comes to TCO. Resale value for a non-Apple PC is essentially zero - i.e. no one will buy it. Whereas there's a strong market for used Apple hardware.

masterj

Apple charges an increased premium as you get further away from the base models. It’s really hard to find a better deal that the M4 base models

kingstnap

Apple base models tend to be fairly competitive but they have some of the most extreme margins on RAM and SSDs in the industry.

They charge $600 CAD to go from 16GB -> 32 GB.

They charge $900 CAD for 512 GB -> 2 TB SSD.

bestham

The point is that before the AMD Ryzen Al Max+ 395 chip there was only Apple that offered something comparable for the desktop / laptop that could do these AI relate tasks. Were else could you find a GPU with 64-128 memory?

wmf

That's purely due to Apple's ridiculous SSD pricing. You can save a lot of money by using an external SSD.

alt227

Except they keep making it harder and harder to install your own drives into apple machines.

stackskipton

You don’t install it but run over Thunderbolt which is plenty fast.

cmrdporcupine

Don't forget how Apple rips you off on RAM. Always.

reactordev

>The AMD 395+ uses unified memory, like Apple, so nearly all of it is addressable to be used by the GPU.

This is why they went with the “laptop” cpu. While it’s slightly slower than dedicated memory, it allows you to run the big models, at decent token speeds.

nottorp

unified and soldered :(

I understand it's faster but still...

Did they at least do an internal PSU if they went the Apple way or does it come with a power brick twice the size of the case?

aomix

I’ve been agonizing over getting the Framework Desktop for weeks as a dev machine/local LLM box/home server. It checks a lot of boxes but the only reason to look at the Framework Desktop over something like a Minisforum MS-A2 is for the LLM and that seems super janky right now. So I guess I’ll wait a beat and see where we are later in the year.

danieldk

My main worry about all the Minisforum, Beelink, etc. PCs is: potential lack of UEFI firmware updates (does anyone have experience with how good they are with updates?) and potential backdoors in the UEFI firmware (either intentionally or unintentionally). A China-aligned/sponsored group has made an UEFI rootkit targetting ASUS/Gigabyte mainboards: https://www.spiceworks.com/it-security/vulnerability-managem... Why not require/compel certain companies to implement them directly?

starkparker

As a Framework 13 owner, their firmware update history isn't that great either.

rgrieselhuber

Any more details you can share?

laweijfmvo

probably doesn’t make sense as a home server unless you need the massive compute. i have a couple lenovo mini pcs (m75q, various generations, AMD) that I paid a total of $500 for on ebay. they’re so easy to find and handle most tasks swimmingly.

oblio

How quiet is the Minisforum?

Kirth

I was baffled by the comparison to the M4 Max. Does this mean that recent AMD chips will be performing at the same level, and what does that mean for on-device LLMs? .. or am I misunderstanding this whole ordeal?

izacus

Yes, the Strix series of AMD uses a similar architecture as M series with massive memory bandwidth and big caches.

That results in significantly better performance.

sidewndr46

Isn't this the desktop architecture that Torvalds suggested years ago?

schmorptron

Will we be able to get similar bandwidth with socketed ram with CAMM / LPCAMM modules in the near future?

topspin

Maybe, but due to the physics of signal integrity, socketed RAM will always be slower than RAM integrated onto the same PCB as whatever processing element is using it, so by the time CAMM / LPCAMM catches up, some newer integrated RAM solution will be faster yet.

This is a matter of physics. It can't be "fixed." Signal integrity is why classic GPU cards have GiBs of integrated RAM chips: GPUs with non-upgradeable RAM that people have been happily buying for years now.

Today, the RAM requirements of GPU and their applications has become so large that the extra, low cost, slow, socketed RAM is now a false economy. Naturally, therefore, it's being eliminated as PCs evolve into big GPUs, with one flavor or other of traditional ISA processing elements attached.

cdavid

I was surprised at previous comparison on omarchy website, because apple m* work really well for data science work that don't require GPU.

It may be explained by integer vs float performance, though I am too lazy to investigate. A weak data point, using a matrix product of N=6000 matrix by itself on numpy:

  - SER 8 8745, linux: 280 ms -> 1.53 Tflops (single prec)
  - my m2 macbook air: it is ~180ms ms -> ~2.4 Tflops (single prec)

This is 2 mins of benchmarking on the computers I have. It is not apple to orange comparison (e.g. I use the numpy default blas on each platform), but not completely irrelevant to what people will do w/o much effort. And floating point is what matters for LLM, not integer computation (which is what the ruby test suite is most likely bottlenecked by)

Tuna-Fish

It's all about the memory bandwidth.

Apple M chips are slower on the computation that AMD chips, but they have soldered on-package fast ram with a wide memory interface, which is very useful on workloads that handle lots of data.

Strix halo has a 256-bit LPDDR5X interface, twice as wide as the typical desktop chip, roughly equal to the M4 Pro and half of that of the M4 Max.

jychang

You're most likely bottlenecked by memory bandwidth for a LLM.

The AMD AI MAX 395+ gives you 256GB/sec. The M4 gives you 120GB/s, and the M4 Pro gives you 273GB/s. The M4 Max: 410GB/s (14‑core CPU/32‑core GPU) or 546GB/s (16‑core CPU/40‑core GPU).

zargon

It’s both. If you’re using any real amount of context, you need compute too.

cdavid

Yeah, memory bandwidth is often the limitation for floating point operations.

biehl

I think DHH compares them because they are both the latest, top-line chips. I think DHHs benchmarks show that they have different performance characteristics. But DHHs favorite benchmark favors whatever runs native linux and docker.

For local LLM the higher memory bandwith of M4 Max makes it much more performant.

Arstechnica has more benchmarks for non-llm things https://arstechnica.com/gadgets/2025/08/review-framework-des...

rr808

After the appstore fight, DHH's favorite is whatever is not Apple lol. TBF it just opened his eyes to alternatives now is happy off that platform.

rramon

How long until he clashes with the GPL and discovers the BSDs?

oblio

What app store fight?

discordance

Not in perf/watt but perf, yes.

jchw

Depends on the benchmark I think. In this case it's probably close. Apple is cagey when it comes to power draw or clock metrics but I believe the M4 max has been seen drawing around 50W in loaded scenarios. Meanwhile, Phoronix clocked the 395+ as drawing an average of 91 watts during their benchmarks. If the performance is ~twice as fast that should be a similar performance per watt. Needless to say it's at least not a dramatic difference the way it was when the M1 came out.

edit: Though the M4 Max may be more power hungry than I'm giving it credit, but it's hard to say because I can't figure out if some of these power draw metrics from random Internet posts actually isolate the M4 itself. It looks like when the GPU is loaded it goes much, much higher.

https://old.reddit.com/r/macbookpro/comments/1hkhtpp/m4_max_...

pengaru

It's not baffling once you realize TSMC is the main defining factor for all these chips, Apple Silicon is simply not that special in the grand scheme of things.

Why do you think TSMC's production being in Taiwan is basically a national security issue for the U.S. at this point?

toasterlovin

> Apple Silicon is simply not that special in the grand scheme of things

Apple Silicon might not be that special from an architecture perspective (although treating integrated GPUs as appropriate for workloads other than low end laptops was a break with industry trends), but it’s very special from an economic perspective. The Apple Silicon unit volumes from iPhones have financed TSMC’s rise to semiconductor process dominance and, it would appear, permanently dethroned Intel.

MegaDeKay

Apple was just the highest bidder for getting the latest TSMC process. They wouldn't have had a problem getting other customers to buy up that capacity. And Intel's missteps counted for a substantial part of the process dominance you refer to. So I'd argue that Apple isn't that special here either.

ozgrakkurt

I don’t think there is a laptop that comes close to battery life or performance while on battery of m1 macbook pro

I hate apple but there is obviously something special about it

ilaksh

How does that Framework Desktop compare with the "GMKtec AI Mini Ryzen Al Max+ 395 128GB" mini PC?

I suspect this one is very similar hardware and a slightly better deal if you give up the cool factor of Framework. Although I don't really know.

Anyone compared them head-to-head?

angst

same CPU, likely a different TDP setting.

This comparison uses hp G1a but I imagine it wouldnt be too far off from GMKtec: https://www.phoronix.com/review/framework-desktop-linux/9

Framework can be fed with more power to support better sustained performance is my understanding.

jychang

They're the same price. Both 395 machines (with 128GB of RAM) are $1999.

Fnoord

Hehe, that is just a Chinese company, isn't it? Say bye bye to warranty and support, repairability. I am not saying you shouldn't consider it but describing Framework merely as cool factor is the other extreme.

qingcharles

I've been turned off GMKTek since I was helping a buddy fresh install Windows and found their drivers are on a Google Drive folder they don't pay for which hits its download quota regularly and so you have to go back day after day and play roulette and hope you get in at the right time. And the drivers are literally nowhere else that I could find, even with driver search tools.

makeitdouble

For nuance:

- framework only sells to specific countries. Warranty won't even be an issue if you can't buy one in the first place.

- Chinese manufacturers offer support and warranty. In particular GMKTek does[0].

- Repairability will be at best on par with framework, but better than a random western brand. Think HP and their repairability track record.

"just a Chinese company" feels pretty weird as a qualificative in this day and age when Chinese companies are on average ahead of the curve.

[0] https://de.gmktec.com/en/pages/ruckgabe-umtausch

lucianbr

Feels like in terms of warranty, support, repairability, it's not so much that Chinese brands have advanced, but that the west has seriously regressed. Every company now is looking to lock me out of my own hardware, extract as much information about me, extract as much value possible by degrading support and compatibility and whatever else they can...

Maybe when we run out of reasons to buy american or european or japanese they will wake up, but I don't see it.

Fnoord

My post is the nuance, and as a buyer/user of GPD, Minisforum, and Xiaomi products I respectfully disagree.

Chinese companies are not on par with Western ones. The QA, safety measures, hazard compliance, warranty, or even proper English (they use an online translator service) isn't there. Cha bu duo is an accurate description of Chinese products.

From the link you send they offer 7 days return policy. In EU you got 2 weeks, legally enforced. Companies like Amazon offer even a month. Then they have a restock fee of 15%. This is AFAIK allowed (if proportional to the damage to the product) but it does not seem proportional. Companies like Amazon don't do this. And Amazon isn't great; they have a lot of cheap Chinese dropshipping brand. Then they often lie in China as well. They claim leather, when you buy it it is fake leather.

Cha bu duo can be good enough if you are on a tight budget, or if the product isn't available otherwise (how I ended up with GPD Pocket 2 back around 2018). But I have personally witnessed how Xiaomi smartphones fucked up a mid sized non-profit who dealt with very sensitive personal details. They went for budget, ended up with a support nightmare, and something which shouldn't be GDPR compliant. Cause yeah, spyware and bloatware is another issue.

Furthermore, Framework sell to Western countries.

sfjailbird

Oh wow, it does indeed have "AI" in its name twice. Cant't wait for this shit to blow over.

marcusb

AI Mini and AI Max. It has everything.

sidewndr46

someone min-max'd their marketing strategy!

jamesgill

I like Framework and own one of their laptops. But the desktop seems more a triumph of gimmicky marketing than a desktop that's meaningfully different. And, it seems significantly overpriced.

rs186

If you can't find an sufficiently similar alternative that is priced at a much better price, it is not overpriced.

baby_souffle

I am very on board with the framework mission. I can afford the premium just to keep their lights on and doors open. The other Chinese OEMs almost certainly won’t offer quite the support for that ~10% discount…

mrbluecoat

I guess the original Raspberry Pi team missed the memo on that.

esafak

For the purposes of running LLM models, a Mac Mini. The PC is cheaper, but it doesn't have MacOS, Apple's service or resale value.

jchw

Actually the pricing is pretty similar.

Framework Desktop price with default selections, 32GB of RAM, 500 GB storage: $1,242.00 USD

Mac Mini with 32GB of RAM, 512 GB storage: $1,199.00

Post changed a bit since I started replying, so:

> For the purposes of running LLM models, a Mac Mini

The M4 Max is the one that actually gives you a shit load of memory bandwidth. If you just get a normal M4 it's not going to be especially good at that.

> it doesn't have MacOS

The Mac can't run Windows, which is used by ~75% of all desktop computer users and the main operating system that video games target. I'd say that would be the bigger problem for many.

> Apple's service

What advantage does that get you over Framework's service?

> resale value

Framework resale value has proven to be excellent by the way. Go to eBay, search "Framework Laptop", go to "Sold Items". Many SKUs seem to be retaining most of their value.

(Nevermind the ease of repair for the Framework, or the superior expandability. If you want to expand the disk space on an M4 you need to get sketchy parts, possibly solder things, and rescue your Mac with another Mac. For framework devices you plug in another M.2 card.)

pimeys

I count not needing to use macOS a big plus. Full Linux support out of the box.

null

[deleted]

croes

> but it doesn't have MacOS, Apple's service or resale value.

If the purpose is running LLMs non of that matters.

But Linux support is an advantage. Does the M4 have that?

dismalaf

The M4 has half the memory bandwidth of the 395+ and the specs on those models are absolute trash. To get an M4 Pro APU and decent specs you're spending at least as much as the Framework, at least here in Canada.

zozbot234

It's taking a newly released mobile- and mini-PC-focused platform that's usually paired with proprietary technology, and building something that's as close as possible to a standard desktop with it. Seems very much in the Framework spirit once you account for that side of it.

mschild

Right, but why go with mobile at all? I get the laptops.

For desktop you already have thousands of choices though and reparability, assuming its not some proprietary Dell/HP desktop, is already as good as it gets without breaking out your soldering iron.

That said, they'll know more about the market demand than I do and another option won't hurt :)

MindSpunk

The specific chip powering the Framework Desktop is something very unique in the PC landscape in general, even in desktop. The Strix Halo chip pairs a 16 core CPU with a huge iGPU that performs like a desktop discrete GPU, and 128GB of RAM (accessible on the GPU).

Strix Halo is almost like having a PS5 or Xbox chip but available for the PC ecosystem. It's a super interesting and unique part for GPU compute, AI, or small form factor gaming.

signal11

Quiet desktop PCs with good thermals have been getting increased interest — not everyone needs a tower, for some a Mac Mini-like device would work great, but not everyone wants to get into the Apple ecosystem for various reasons.

Of course this PC is interesting in that it’s more “workstation class” and I’m not sure how much thermals matter there, but maybe this is an iteration towards a Mac Studio like device.

zozbot234

> Right, but why go with mobile at all? I get the laptops.

Pair a power-efficient mobile chip with a mini-desktop form factor and a good (i.e. probably overengineered, to some extent) cooling solution, and it will give you a kind of sustained performance and reliability over time that you just aren't going to get from the average consumer/enthusiast desktop chip. Great for workstation-like use cases that still don't quite need the raw performance and official support you'd get from a real, honest-to-goodness HEDT.

timc3

Because its using a Strix Halo APU which to some is kinda interesting, and to others all they need for sometime.

null

[deleted]

wiseowise

Supporting OSS and repairable hardware?

komali2

I'm realizing that I may have misunderstood Framework's market. I thought it was tinkerers and environmentally conscious FOSS nerds like me, but I think there maybe be a huge enterprise segment whose employees in charging of purchasing are like me but answer to much more strict business needs than "Isn't it cool that it comes with a screwdriver in the box?" So for example the underpowered cpu in the fw12 makes no sense to me until I found out that it's also designed for mass purchases by schools and designed to be flung around by angsty teens. The desktop seems to be meant to be strapped to the underside of 40 identical cubicals in an office as much as it's meant to be apparently hauled around by people that want to have CSGO lan parties.

zozbot234

> So for example the underpowered cpu in the fw12 makes no sense to me until I found out that it's also designed for mass purchases by schools and designed to be flung around by angsty teens.

I think that might be overstating it a bit. Real "rugged" laptops do exist, and would be quite at home in that kind of use (well, usually you'd worry a lot more about how kids in primary school will treat your hardware than teenagers) but the Framework 12 is not one.

FLHerne

Real "rugged" laptops are far too expensive for schools to buy by the dozen. Also, while robust against the environment they're not so much against deliberate vandalism or theft. The target market for those seems to be construction/industrial and similar, and of course the military.

All school laptop fleets I've seen are simply the cheapest thing they can buy in bulk, when it breaks provision a new one.

Archit3ch

RDNA 3.5, which means you don't get Matrix Cores. Those are reserved for RDNA 4, which comes to laptop chips later this year. Desktop RDNA 4 only shipped in 2025.

For comparison, Nvidia brought Tensor Cores to consumer cards in 2022 with the 4000 series and Apple had simdgroup_matrix since 2020!

We are moving towards a world where this hardware is ubiquitous. It's uncertain what that means for non-ML workloads.

zozbot234

What do you need Matrix Cores for when you already have a NPU which can access the same memory, and even seems to include more flexible FPGA fabric? It's six of one, half a dozen of another.

SomeHacker44

I have the HP Zbook G1a running the same CPU and RAM under HP Ubuntu. I have not seen any OOTB way to use the TPU. I can get ROCm software to run but it does not use it. No system tools show its activity that I can see. It seems to be a marketing gimmick. Shame.

null

[deleted]

transpute

https://news.ycombinator.com/item?id=43671940#43674311

> The PFB is found in many different application domains such as radio astronomy, wireless communication, radar, ultrasound imaging and quantum computing.. the authors worked on the evaluation of a PFB on the AIE.. [developing] a performant dataflow implementation.. which made us curious about the AMD Ryzen NPU.

> The [NPU] PFB figure shows.. speedup of circa 9.5x compared to the Ryzen CPU.. TINA allows running a non-NN algorithm on the NPU with just two extra operations or approximately 20 lines of added code.. on [Nvidia] GPUs CUDA memory is a limiting factor.. This limitation is alleviated on the AMD Ryzen NPU since it shares the same memory with the CPU providing up to 64GB of memory.

bigyabai

The NPU is generally pretty weak and not pipelined into the GPU's logic (which is already quite large on-die). It feels like the past 10 years have taught us that if you're going to create tensor-specific hardware then it makes the most sense to put it in your GPU and not a dark-silicon coprocessor.

Archit3ch

Can you do GPU -> NPU -> GPU for streaming workloads? The GPU can be more flexible than Tensor HW for preprocessing, light branching, etc.

Also, Strix Halo NPU is 50 TOPS. The desktop RDNA 4 chips are into the 100s.

As for consumer uses, I mentioned it's an open question. Blender? FFmpeg? Database queries? Audio?

fmajid

I cancelled my Framework Desktop order and ordered a HP Z2 Mini G1a instead, the goal being to replace my Mac Studio as I've had it with Apple's arrogance and lousy software quality. The HP is much smaller, has ECC RAM and 10G Ethernet. Significantly more expensive, however.

DrBenCarson

Apple have certainly taken a couple steps back re: overall reliability, but if you think that the grass is greener on the other side…pray tell how that goes

Plus, you can now deploy [MLX projects on CUDA](https://github.com/ml-explore/mlx/pull/1983)

paxys

Even if the grass is the same on the other side a 50% discount for the same performance doesn’t seem too bad.

worthless-trash

Do you linux on this ? if so.. does everything work ?

fmajid

Haven't received it yet, but of course I will install Linux on it.

mixmastamyk

Was scoffing at HP, but then you got my attention with ECC RAM. Looks nice as well.

sliken

Keep scoffing, it's not real end to end ECC, just "link" ecc, which is just part of the chip -> CPU pipeline.

So it's not full ECC like servers have with dimms with a multiple of 9 chips with ECC protecting everything from the dimms to the CPU.

Keep in mind the ram is inside the strix halo package, not something HP has control over.

wtallis

> Keep in mind the ram is inside the strix halo package, not something HP has control over.

It's not in the package, it's on the motherboard spread around the SoC package: https://www.hp.com/content/dam/sites/worldwide/personal-comp...

The 8 DRAM packages pretty clearly indicate you're not getting the extra capacity for end-to-end ECC as you would on a typical workstation or server memory module.

Marsymars

Wait is there any actual difference in the RAM between the HP and the Framework Desktop?

ashleyn

How is AMD GPU compatibility with leading generative AI workflows? I'm under the impression everything is CUDA.

ftvkyo

There is a project called SCALE that allows building CUDA code natively for AMD GPUs. It is designed as a drop-in replacement for Nvidia CUDA, and it is free for personal and educational use.

You can find out more here: https://docs.scale-lang.com/stable/

There are still many things that need implementing, most important ones being cuDNN and CUDA Graph API, but in my opinion, the list of things that are supported now is already quite impressive (and keeps improving): https://github.com/spectral-compute/scale-validation/tree/ma...

Disclaimer: I am one of the developers of SCALE.

Aeolun

All of Ollama and Stable Diffusion based stuff now works on my AMD cards. Maybe it’s different if you want to actually train things, but I have no issues running anything that fits in memory any more.

nh43215rgb

In practical generative AI workflows (LLMs), I think AMD Max+395 chips with unified memory is as good as Mac Studio or MacBook Pro configurations in handling big models locally and support fast inference speeds (However Top-end Apple silicon (M4 Max, Studio Ultra) can reach 546GB/s memory bandwidth, while the AMD unified memory system is around 256GB/s). I think for inference use either will work fine. For everything else I think CUDA ecosystem is a better bet (correct me if I'm wrong).

pja

llama.cpp combined with Mesa’s Vulkan support for AMD GPUs has worked pretty well with everything I’ve thrown it at.

throwdbaaway

https://llm-tracker.info/_TOORG/Strix-Halo has very comprehensive test results for running llama.cpp with Strix Halo. This one is particularly interesting:

> But when we switch to longer context, we see something interesting happen. WMMA + FA basically loses no performance at this longer context length!

> Vulkan + FA still has better pp but tg is significantly lower. More data points would be better, but seems like Vulkan performance may continue to decrease as context extends while the HIP+rocWMMA backend should perform better.

lhl has also been sharing these test results in https://forum.level1techs.com/t/strix-halo-ryzen-ai-max-395-..., and his latest comment provides a great summary of the current state:

> (What is bad is that basically every single model has a different optimal backend, and most of them have different optimal backends for pp (handling context) vs tg (new text)).

Anyway, for me, the greatest thing about the Strix Halo + llama.cpp combo is that we can throw one or more egpu into the mix, as echoed by level1tech video (https://youtu.be/ziZDzrDI7AM?t=485), which should help a lot with PP performance.

sbinnee

My impression is the same. To train anything you just need to have CUDA gpus. For inference I think AMD and Apple M chips are getting better and better.

jychang

For inference, Nvidia/AMD/Intel/Apple are all generally on the same tier now.

There's a post on github of a madman who got llama.cpp generating tokens for an AI model that's running on an Intel Arc, Nvidia 3090, and AMD gpu at the same time. https://github.com/ggml-org/llama.cpp/pull/5321

DiabloD3

CUDA isn't really used for new code. Its used for legacy codebases.

In the LLM world, you really only see CUDA being used with Triton and/or PyTorch consumers that haven't moved onto better pastures (mainly because they only know Python and aren't actually programmers).

That said, AMD can run most CUDA code through ROCm, and AMD officially supports Triton and PyTorch, so even the academics have a way out of Nvidia hell.

sexeriy237

If you're not doing machine code by hand, you're not a programmer

phanimahesh

If you are not winding copper around magnets by hand, you are not a real programmer

dgan

sooo what's the successor of cuda?

DiabloD3

CUDA largely was Nvidias attempt at swaying Khronos and Microsoft's DirectX team. In the end, Khronos went with something based on a blend of AMD's and Nvidia's ideas, and that became Vulkan, and Microsoft just duplicated the effort in a Direct3D-flavored way.

So, just use Vulkan and stop fucking around with the Nvidia moat.

komali2

What are non legacy codebases using, then?

DiabloD3

Largely Vulkan. Microsoft internally is a huge consumer of DirectML for specifically the LLM team doing Phi and the Copilot deployment that lives at Azure.

TiredOfLife

ROCm doesn't work on this device

geerlingguy

You mean the AI Max chips? ROCm works fine there, as long as you're running 6.4.1 or later, no hacks required. I tested on Fedora Rawhide and it was just dnf install rocm.

DiabloD3

Yes it does. ROCm support for new chips, due to being available for paid support contracts, comes like 1-2 months after the chip comes out (ie, when they're 100% sure it works with the current, also new, driver).

I'd rather it works and ships late than doesn't work and ships early and then get gaslit about the bugs (lol Nvidia, why are you like this?)

dismalaf

> I'm under the impression everything is CUDA

A very quick Google search would show that pretty much everything also runs on ROCm.

Torch runs on CUDA and ROCm. Llama.cpp runs on CUDA, ROCm, SYCL, Vulkan and others...

trenchpilgrim

Certain chips can work with useful local models, but compatibility is far behind CUDA.

kristianp

> There's at least a little flexibility with the graphics card if you move the board into a different case—there's a single PCIe x4 slot on the board that you could put an external GPU into, though many PCIe x16 graphics cards will be bandwidth starved.

https://arstechnica.com/gadgets/2025/08/review-framework-des...

monster_truck

There are no situations where this matters yet. You have to drop down to an 8x slot on PCIe 3.0 to even begin to see any meaningful impact on benchmarks (synthetic or otherwise)

wolfgangK

For LLM inference, I don't think the PCIe bandwidth matters much and a GPU could improve greatly the prompt processing speed.

zozbot234

The Strix Halo iGPU is quite special, like the Apple iGPU it has such good memory bandwidth to system RAM that it manages to improve both prompt processing and token generation compared to pure CPU inference. You really can't say that about the average iGPU or low-end dGPU: usually their memory bandwidth is way too anemic, hence the CPU wins when it comes to emitting tokens.

ElectricalUnion

Only if your entire model fits the GPU VRAM.

To me this reads like "if you can afford those 256GB VRAM GPUs, you don't need PCIe bandwidth!"

jychang

No, that's not true. Prompt processing just needs attention tensors in VRAM, the MLP weights aren't needed for the heavy calculations that a GPU speeds up. (After attention, you only need to pass the activations from GPU to system RAM, which is about ~40KB so you're not very limited here).

That's pretty small.

Even Deepseek R1 0528 685b only has like ~16GB of attention weights. Kimi K2 with 1T parameters has 6168951472 attention params, which means ~12GB.

It's pretty easy to do prompt processing for massive models like Deepseek R1, Kimi K2, or Qwen 3 235b with only a single Nvidia 3090 gpu. Just do --n-cpu-moe 99 in llama.cpp or something similar.

tgma

If you can't, your performance will likely be abysmal though, so there's almost no middle ground for the LLM workload.

jgalt212

Yeah, I think so. Once the whole model is on the GPU (potentially slower start-up), there really isn't much traffic between the GPU and the motherboard. That's how I think about it. But mostly saying this as I'm interested in being corrected if I'm wrong.

conradev

You can also use an adapter to repurpose an M.2 slot as PCIe x16, but the bandwidth is the same x4

tgma

That's just called a PCIe x4 [1]. Each PCIe lane is an independent channel. The wider slot will simply have disconnected pins. You can actually do this with regular motherboard PCIe x4 slots by cutting the plastic at the end of the slot so you can insert a wider card and most cards work just fine.

[1]: It sounds like a nitpick but a PCIe x16 with x4 effective bandwidth can exist and is a different thing: if the actual PCIe interface is x16, but there is an upstream bottleneck (e.g. aggregate bandwidth from chipset to CPU is not enough to handle all peripherals at once at full rate.)

zargon

So DHH fell for Sam’s scam. He tried OSS 20b and wasn’t impressed, and apparently dismisses all local models based on that experience with a known awful model.

cuu508

What are more budget friendly options for similar workloads (running web app test suite in parallel)?

My test suite currently runs in ~6 seconds on 9700K. Would be nice to speed it up, but maybe not for $2000 :-) Last I checked 13700K or 13900K looked like the price/performance sweet spot, but perhaps there are better options?

SomeoneOnTheWeb

Minisforum 790S7/795S7, mini-ITX desktop.

16 cores, 32 threads, only a bit less powerful than a desktop Ryzen 7950X or a 14900K, but with a comparatively low power usage.

About 500€ barebones, then you add your own SSD and SO-DIMM RAM.

nextos

How is the cooling system on that Minisforum?

Is it noisy? Does it keep up with the 7950X?

yencabulator

I bought https://store.minisforum.com/products/minisforum-um890pro to compile Rust faster than my laptop, with 96 GB RAM and 2x4 TB NVMe as a ZFS mirror. Back before Framework Desktop existed.

It has the 8945HS CPU, the article benchmarks against 8745H which is a little bit slower. It's a very worthy price point to consider, tiny and very quiet.

qwen3:30b-a3b runs locally at 23.25 tokens/sec. I know 395+ chip would about approximately double that, but I'm not quite willing to put $2000 into that upgrade.

ohdeargodno

>My test suite currently runs in ~6 seconds on 9700K

Absolutely nothing. 6 seconds is about the time it will take you to tab to your terminal, press up arrow, find your test task and run it. There's no amount of money that makes it go from 6 to 3, and no world in which there's any value to it.

In addition, upgrading to a 13900K means you're playing the Intel Dance: sockets have (again) changed, in an (again) incompatible manner. So you're looking at, at the very least, a new CPU, a new motherboard, potentially a new cooler, and if you're going too forward with CPUs, new RAM since Intel's Z890 is not DDR4 compatible (and the Z390 was not DDR5 compatible). Or buying an entire new PC.

Since you're behind a socket wall, the reasonable option for an upgrade would rather be a sizeable one, and most likely abandoning Intel to its stupid decisions for a while and instead going for Zen 5 CPUs, which are going to be socket compatible for a good 7 years at least.

cuu508

6 seconds is the time it takes for the tests to run, after I've switched to the terminal and ran the command. If I switch from 8 cores to, say, 16 faster cores, IMHO it is not unthinkable the tests could speed up to 3 seconds. How much money to invest for this speedup is a subjective question.

I'm thinking about a new system, not upgrading the existing one.

christophilus

It’s really nice to save and have your tests automatically run and go green (or red) nearly instantly. There is value to that. Performance matters.

ohdeargodno

That's called not rerunning all the tests in your project and having test harnesses that know of module boundaries.

In addition, considering "saving" is something that happens on pretty much any non-code interaction, it means your tests are broken half the time when you're working. That's useless noise.

HN

The Framework Desktop is a beast

The Framework Desktop is a beast