Skip to content(if available)orjump to list(if available)

Nvidia DGX Spark: great hardware, early days for the ecosystem

rcarmo

About what I expected. The Jetson series had the same issues, mostly, at a smaller scale: Deviate from the anointed versions of YOLO, and nothing runs without a lot of hacking. Being beholden to CUDA is both a blessing and a curse, but what I really fear is how long it will take for this to become an unsupported golden brick.

Also, the other reviews I’ve seen point out that inference speed is slower than a 5090 (or on par with a 4090 with some tailwind), so the big difference here (other than core counts) is the large chunk of “unified” memory. Still seems like a tricky investment in an age where a Mac will outlive everything else you care to put on a desk and AMD has semi-viable APUs with equivalent memory architectures (even if RoCm is… well… not all there yet).

Curious to compare this with cloud-based GPU costs, or (if you really want on-prem and fully private) the returns from a more conventional rig.

simonw

It's notable how much easier it is to get things working now that the embargo has lifted and other projects have shared their integrations.

I'm running VLLM on it now and it was as simple as:

  docker run --gpus all -it --rm \
    --ipc=host --ulimit memlock=-1 \
    --ulimit stack=67108864 \
    nvcr.io/nvidia/vllm:25.09-py3
(That recipe from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/vllm?v... )

And then in the Docker container:

  vllm serve &
  vllm chat
The default model it loads is Qwen/Qwen3-0.6B, which is tiny and fast to load.

behnamoh

I'm curious, does its architecture support all CUDA features out of the box or is it limited compared to 5090/6000 Blackwell?

jhcuii

Despite the large video memory capacity, its video memory bandwidth is very low. I guess the model's decode speed will be very slow. Of course, this design is very well suited for the inference needs of MoE models.

reenorap

Is 128 GB of unified memory enough? I've found that the smaller models are great as a toy but useless for anything realistic. Will 128 GB hold any model that you can do actual work with or query for answers that returns useful information?

simonw

There are several 70B+ models that are genuinely useful these days.

I'm looking forward to GLM 4.6 Air - I expect that one should be pretty excellent, based on experiments with a quantized version of its predecessor on my Mac. https://simonwillison.net/2025/Jul/29/space-invaders/

behnamoh

the question is: how does the prompt processing time on this compare to M3 Ultra because that one sucks at RAG even though it can technically handle huge models and long contexts...

two_handfuls

I wonder how this compares financially with renting something on the cloud.

fnordpiglet

This seems to be missing the obligatory pelican on a bicycle.

simonw

Here's one I made with it - I didn't include it in the blog post because I had so many experiments running that I lost track of which model I'd used to create it! https://tools.simonwillison.net/svg-render#%3Csvg%20width%3D...

fnordpiglet

That seat post looks fairly unpleasant.

monster_truck

Whole thing feels like a paper launch being held up by people looking for blog traffic missing the point.

I'd be pissed if I paid this much for hardware and the performance was this lacklustre while also being kneecapped for training

rubatuga

When the networking is 25GB/s and the memory bandwidth is 210GB/s you know something is seriously wrong.

rgovostes

I'm hopeful this makes Nvidia take aarch64 seriously for Jetson development. For the past several years Mac-based developers have had to run the flashing tools in unsupported ways, in virtual machines with strange QEMU options.

matt3210

> even in a Docker container

I should be allowed to do stupid things when I want. Give me an override!

simonw

A couple of people have since tipped me off that this works around that:

  IS_SANDBOX=0 claude --dangerously-skip-permissions
You can run that as root and Claude won't complain.

fisian

The reported 119GB vs. 128GB according to spec is because 128GB (1e9 bytes) equals 119GiB (2^30 bytes).

wmf

That can't be right because RAM has always been reported in binary units. Only storage and networking use lame decimal units.

simonw

Looks like Claude reported it based on this:

  ● Bash(free -h)
    ⎿                 total        used        free      shared  buff/cache   available
       Mem:           119Gi       7.5Gi       100Gi        17Mi        12Gi       112Gi
       Swap:             0B          0B          0B
That 119Gi is indeed gibibytes, and 119Gi in GB is 128GB.

null

[deleted]

null

[deleted]

simonw

Ugh, that one gets me every time!

ur-whale

As is usual for NVidia: great hardware, an effing nightmare figuring out how to setup the pile of crap they call software.

kanwisher

If you think their software is bad try using any other vendor , makes nvidia looks amazing. Apple is only one close

enoch2090

Although a bit off the GPU topic, I think Apple's Rosetta is the smoothest binary transition I've ever used.

pjmlp

Try to use Intel or AMD stuff instead.

p_l

And yet CUDA has looked way better than ATi/AMD offerings in the same area despite ATi/AMD technically being first to deliver GPGPU (major difference is that CUDA arrived year later but supported everything from G80 up, and nicely evolved, while AMD managed to have multiple platforms with patchy support and total rewrites in between)

cylemons

What was the AMD GPGPU called?

jasonjmcghee

Except the performance people are seeing is way below expectations. It seems to be slower than an M4. Which kind of defeats the purpose. It was advertised as 1 Petaflop on your desk.

But maybe this will change? Software issues somehow?

It also runs CUDA, which is useful

airstrike

it fits bigger models and you can stack them.

plus apparently some of the early benchmarks were made with ollama and should be disregarded