Nvidia DGX Spark: great hardware, early days for the ecosystem
31 comments
·October 15, 2025simonw
It's notable how much easier it is to get things working now that the embargo has lifted and other projects have shared their integrations.
I'm running VLLM on it now and it was as simple as:
docker run --gpus all -it --rm \
--ipc=host --ulimit memlock=-1 \
--ulimit stack=67108864 \
nvcr.io/nvidia/vllm:25.09-py3
(That recipe from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/vllm?v... )And then in the Docker container:
vllm serve &
vllm chat
The default model it loads is Qwen/Qwen3-0.6B, which is tiny and fast to load.behnamoh
I'm curious, does its architecture support all CUDA features out of the box or is it limited compared to 5090/6000 Blackwell?
jhcuii
Despite the large video memory capacity, its video memory bandwidth is very low. I guess the model's decode speed will be very slow. Of course, this design is very well suited for the inference needs of MoE models.
reenorap
Is 128 GB of unified memory enough? I've found that the smaller models are great as a toy but useless for anything realistic. Will 128 GB hold any model that you can do actual work with or query for answers that returns useful information?
simonw
There are several 70B+ models that are genuinely useful these days.
I'm looking forward to GLM 4.6 Air - I expect that one should be pretty excellent, based on experiments with a quantized version of its predecessor on my Mac. https://simonwillison.net/2025/Jul/29/space-invaders/
behnamoh
the question is: how does the prompt processing time on this compare to M3 Ultra because that one sucks at RAG even though it can technically handle huge models and long contexts...
two_handfuls
I wonder how this compares financially with renting something on the cloud.
fnordpiglet
This seems to be missing the obligatory pelican on a bicycle.
simonw
Here's one I made with it - I didn't include it in the blog post because I had so many experiments running that I lost track of which model I'd used to create it! https://tools.simonwillison.net/svg-render#%3Csvg%20width%3D...
fnordpiglet
That seat post looks fairly unpleasant.
monster_truck
Whole thing feels like a paper launch being held up by people looking for blog traffic missing the point.
I'd be pissed if I paid this much for hardware and the performance was this lacklustre while also being kneecapped for training
rubatuga
When the networking is 25GB/s and the memory bandwidth is 210GB/s you know something is seriously wrong.
rgovostes
I'm hopeful this makes Nvidia take aarch64 seriously for Jetson development. For the past several years Mac-based developers have had to run the flashing tools in unsupported ways, in virtual machines with strange QEMU options.
matt3210
> even in a Docker container
I should be allowed to do stupid things when I want. Give me an override!
simonw
A couple of people have since tipped me off that this works around that:
IS_SANDBOX=0 claude --dangerously-skip-permissions
You can run that as root and Claude won't complain.fisian
The reported 119GB vs. 128GB according to spec is because 128GB (1e9 bytes) equals 119GiB (2^30 bytes).
wmf
That can't be right because RAM has always been reported in binary units. Only storage and networking use lame decimal units.
simonw
Looks like Claude reported it based on this:
● Bash(free -h)
⎿ total used free shared buff/cache available
Mem: 119Gi 7.5Gi 100Gi 17Mi 12Gi 112Gi
Swap: 0B 0B 0B
That 119Gi is indeed gibibytes, and 119Gi in GB is 128GB.null
null
simonw
Ugh, that one gets me every time!
ur-whale
As is usual for NVidia: great hardware, an effing nightmare figuring out how to setup the pile of crap they call software.
kanwisher
If you think their software is bad try using any other vendor , makes nvidia looks amazing. Apple is only one close
enoch2090
Although a bit off the GPU topic, I think Apple's Rosetta is the smoothest binary transition I've ever used.
pjmlp
Try to use Intel or AMD stuff instead.
p_l
And yet CUDA has looked way better than ATi/AMD offerings in the same area despite ATi/AMD technically being first to deliver GPGPU (major difference is that CUDA arrived year later but supported everything from G80 up, and nicely evolved, while AMD managed to have multiple platforms with patchy support and total rewrites in between)
cylemons
What was the AMD GPGPU called?
jasonjmcghee
Except the performance people are seeing is way below expectations. It seems to be slower than an M4. Which kind of defeats the purpose. It was advertised as 1 Petaflop on your desk.
But maybe this will change? Software issues somehow?
It also runs CUDA, which is useful
airstrike
it fits bigger models and you can stack them.
plus apparently some of the early benchmarks were made with ollama and should be disregarded
ChrisArchitect
More discussion: https://news.ycombinator.com/item?id=45575127
About what I expected. The Jetson series had the same issues, mostly, at a smaller scale: Deviate from the anointed versions of YOLO, and nothing runs without a lot of hacking. Being beholden to CUDA is both a blessing and a curse, but what I really fear is how long it will take for this to become an unsupported golden brick.
Also, the other reviews I’ve seen point out that inference speed is slower than a 5090 (or on par with a 4090 with some tailwind), so the big difference here (other than core counts) is the large chunk of “unified” memory. Still seems like a tricky investment in an age where a Mac will outlive everything else you care to put on a desk and AMD has semi-viable APUs with equivalent memory architectures (even if RoCm is… well… not all there yet).
Curious to compare this with cloud-based GPU costs, or (if you really want on-prem and fully private) the returns from a more conventional rig.