Nvidia's Project Digits is a 'personal AI supercomputer'

509 comments

·January 7, 2025

Abishek_Muthian

I'm looking at my Jetson Nano in the corner which is fulfilling its post-retirement role as a paper weight because Nvidia abandoned it in 4 years.

Nvidia Jetson Nano, A SBC for "AI" debuted with already aging custom Ubuntu 18.04 and when 18.04 went EOL, Nvidia abandoned it completely without any further updates to its proprietary jet-pack or drivers and without them all of Machine Learning stack like CUDA, Pytorch etc. became useless.

I'll never buy a SBC from Nvidia unless all the SW support is up-streamed to Linux kernel.

lolinder

This is a very important point.

In general, Nvidia's relationship with Linux has been... complicated. On the one hand, at least they offer drivers for it. On the other, I have found few more reliable ways to irreparably break a Linux installation than trying to install or upgrade those drivers. They don't seem to prioritize it as a first class citizen, more just tolerate it the bare minimum required to claim it works.

dotancohen

  > Nvidia's relationship with Linux has been... complicated.

For those unfamiliar with Linus Torvalds' two-word opinion of Nvidia:

https://youtube.com/watch?v=OF_5EKNX0Eg

andirk

Wow. Torvalds' distaste for Nvidia from that, albeit 12 year old, clip leaves little to the imagination. Re: gaming GPUs, Windows is their main OS, but is that the main reason why Huang only mentioned Windows in his CES 2025 keynote? Their gaming chips are a small portion of the company now. But they want to focus dev on Windows??

stabbles

Now that the majority of their revenue is from data centers instead of Windows gaming PCs, you'd think their relationship with Linux should improve or already has.

sangnoir

Nvidia segments its big iron AI hardware from the consumer/prosumer segment. They do this by forbidding the use of GeForce drivers in datacenters[1]. All that to say, it is possible for the H100 to to have excellent Linux support, while support for the 4090 is awful.

1. https://www.datacenterdynamics.com/en/news/nvidia-updates-ge...

robhlt

They have been making real improvements the last few years. Most of their proprietary driver code is in firmware now, and the kernel driver is open-source[1] (the userland-side is still closed though).

They've also significantly improved support for wayland and stopped trying to force eglstreams on the community. Wayland+nvidia works quite well now, especially after they added explicit sync support.

1. https://github.com/NVIDIA/open-gpu-kernel-modules/

lolinder

It's possible. I haven't had a system completely destroyed by Nvidia in the last few years, but I've been assuming that's because I've gotten in the habit of just not touching it once I get it working...

FuriouslyAdrift

The Digits device runs the same nVidia DGX OS (nVidia custom Ubuntu distro) that they run on their cloud infra.

vladslav

I've had a similar experience, my Xavier NX stopped working after the last update and now it's just collecting dust. To be honest, I've found the Nvidia SBC to be more of a hassle than it's worth.

busterarm

Xavier AGX owner here to report the same.

justin66

My Jetson TX2 developer kit didn't stop working, but it's on a very out of date Linux distribution.

Maybe if Nvidia makes it to four trillion in market cap they'll have enough spare change to keep these older boards properly supported, or at least upstream all the needed support.

smallmancontrov

Wait, my AGX is still working, but I have kept it offline and away from updates. Do the updates kill it? Or is it a case of not supporting newer pytorch or something else you need?

moondev

Xavier AGX is awesome for running ESXi aarch64 edition, including aarch64 Windows vms

aseipp

The Orin series and later use UEFI and you can apparently run upstream, non-GPU enabled kernels on them. There's a user guide page documenting it. So I think it's gotten a lot better, but it's sort of moot because the non-GPU thing is because the JetPack Linux fork has a specific 'nvgpu' driver used for Tegra devices that hasn't been unforked from that tree. So, you can buy better alternatives unless you're explicitly doing the robotics+AI inference edge stuff.

But the impression I get from this device is that it's closer in spirit to the Grace Hopper/datacenter designs than it is the Tegra designs, due to both the naming, design (DGX style) and the software (DGX OS?) which goes on their workstation/server designs. They are also UEFI, and in those scenarios, you can (I believe?) use the upstream Linux kernel with the open source nvidia driver using whatever distro you like. In that case, this would be a much more "familiar" machine with a much more ordinary Linux experience. But who knows. Maybe GH200/GB200 need custom patches, too.

Time will tell, but if this is a good GPU paired with a good ARM Cortex design, and it works more like a traditional Linux box than the Jeton series, it may be a great local AI inference machine.

moondev

AGX also has UEFI firmware which allows you to install ESXi. Then you can install any generic EFI arm64 iso in a VM with no problems, including windows.

halJordan

It runs their dgx os and Jensen specifically said it would be a full part if their hw stack

startupsfail

If this is DGX OS, then yes, this is what you’ll find installed on their 4-cards workstations.

This is more like a micro-DGX then, for $3k.

yoyohello13

And unless there is some expanded maintenance going on, 22.04 is EOL in 2 years. In my experience, vendors are not as on top of security patches as upstream. We will see, but given NVIDIA's closed ecosystem, I don't have high hopes that this will be supported long term.

saidinesh5

Is there any recent, powerful SBC with fully upstream kernel support?

I can only think of raspberry pi...

sliken

rk3588 is pretty close, I believe it's usable today, just missing a few corner cases with HDMI or some such. I believe that last patches are either pending or already applied to an RC.

shadowpho

Radha but that’s n100 aka x64

msh

The odroid H series. But that packs a x86 cpu.

nickpsecurity

If its stack still works, you might be able to sell or donate it to a student experimenting. They can still learn quite a few things with it. Maybe even use it for something.

sangnoir

Using outdated tensorflow (v1 from 2018) or outdated PyTorch makes learning harder than it need to be, considering most resources online use much newer versions of the frameworks. If you're learning the fundamentals and working from first principle and creating the building blocks yourself, then it adds to the experience. However, most most people just want to build different types of nets, and it's hard to do when the code won't work for you.

tcdent

If you're expecting this device to stay relevant for 4 years you are not the target demographic.

Compute is evolving way too rapidly to be setting-and-forgetting anything at the moment.

tempoponet

Today I'm using 2x 3090's which are over 4 years old at this point and still very usable. To get 48gb vram I would need 3x 5070ti - still over $2k.

In 4 years, you'll be able to combine 2 of these to get 256gb unified memory. I expect that to have many uses and still be in a favorable form factor and price.

mrybczyn

Eh? By all indications compute is now evolving SLOWER than ever. Moore's Law is dead, Dennard scaling is over, the latest fab nodes are evolutionary rather than revolutionary.

This isn't the 80s when compute doubled every 9 months, mostly on clock scaling.

sliken

Indeed, generational improvements are at an all time low. Most of the "revolutionary" AI and/or GPU improvements are less precision (fp32 -> fp16 -> fp8 -> fp4) or adding ever more fake pixels, fake frames, and now in the most recent iteration multiple fake frames per computed frame.

I believe Nvidia has some published numbers for the 5000 series that showed DLSS off performance, which allowed a fair comparison to the previous generation, on the order of 25%, then removed it.

Thankfully the 3rd party benchmarks that use the same settings on old and new hardware should be out soon.

tcdent

Fab node size is not the only factor in performance. Physical limits were reached, and we're pulling back from the extremely small stuff for the time being. That is the evolutionary part.

Revolutionary developments are: multi-layer wafer bonding, chiplets (collections of interconnected wafers) and backside power delivery. We don't need the transistors to keep getting physically smaller, we need more of them, and at increased efficiency, and that's exactly what's happening.

Karupan

I feel this is bigger than the 5x series GPUs. Given the craze around AI/LLMs, this can also potentially eat into Apple’s slice of the enthusiast AI dev segment once the M4 Max/Ultra Mac minis are released. I sure wished I held some Nvidia stocks, they seem to be doing everything right in the last few years!

rbanffy

This is something every company should make sure they have: an onboarding path.

Xeon Phi failed for a number of reasons, but one where it didn't need to fail was availability of software optimised for it. Now we have Xeons and EPYCs, and MI300C's with lots of efficient cores, but we could have been writing software tailored for those for 10 years now. Extracting performance from them would be a solved problem at this point. The same applies for Itanium - the very first thing Intel should have made sure it had was good Linux support. They could have it before the first silicon was released. Itaium was well supported for a while, but it's long dead by now.

Similarly, Sun has failed with SPARC, which also didn't have an easy onboarding path after they gave up on workstations. They did some things right: OpenSolaris ensured the OS remained relevant (still is, even if a bit niche), and looking the other way for x86 Solaris helps people to learn and train on it. Oracle cloud could, at least, offer it on cloud instances. Would be nice.

Now we see IBM doing the same - there is no reasonable entry level POWER machine that can compete in performance with a workstation-class x86. There is a small half-rack machine that can be mounted on a deskside case, and that's it. I don't know of any company that's planning to deploy new systems on AIX (much less IBMi, which is also POWER), or even for Linux on POWER, because it's just too easy to build it on other, competing platforms. You can get AIX, IBMi and even IBMz cloud instances from IBM cloud, but it's not easy (and I never found a "from-zero-to-ssh-or-5250-or-3270" tutorial for them). I wonder if it's even possible. You can get Linux on Z instances, but there doesn't seem to be a way to get Linux on POWER. At least not from them (several HPC research labs still offer those).

nimish

1000% all these ai hardware companies will fail if they don't have this. You must have a cheap way to experiment and develop. Even if you want to only sell a $30000 datacenter card you still need a very low cost way to play.

Sad to see big companies like intel and amd don't understand this but they've never come to terms with the fact that software killed the hardware star

theptip

Isn’t the cloud GPU market covering this? I can run a model for $2/hr, or get a 8xH100 if I need to play with something bigger.

rbanffy

> Sad to see big companies like intel and amd don't understand this

And it's not like they were never bitten (Intel has) by this before.

AtlasBarfed

It really mystifies me that Intel AMD and other hardware companies obviously Nvidia in this case Don't either have a consortium or each have their own in-house Linux distribution with excellent support.

Windows has always been a barrier to hardware feature adoption to Intel. You had to wait 2 to 3 years, sometimes longer, for Windows to get around us providing hardware support.

Any OS optimizations in Windows you had to go through Microsoft. So say you added some instructions custom silicon or whatever to speed up Enterprise databases, provide high-speed networking that needed some special kernel features, etc, there was always Microsoft being in the way.

Not just in the drag the feet communication. Getting the tech people a line problem.

Microsoft will look at every single change. It did as to whether or not it would challenge their Monopoly whether or not it was in their business interest whether or not it kept you as the hardware and a subservient role.

p_ing

From the consumer perspective, it seems that MSFT has provided scheduler changes fairly rapidly for CPU changes, like X3D, P/e cores, etc. At least within a couple of months, if not at release.

Amd/Intel work directly with Microsoft for shipping new silicon that would otherwise require it.

p_ing

Raptor Computing provides POWER9 workstations. They're not cheap, still use last-gen hardware (DDR4/PCIe 4 ... and POWER9 itself) but they're out there.

https://www.raptorcs.com/content/base/products.html

rbanffy

It kind of defeats the purpose of an onboarding platform if it’s more expensive than the one you think of moving away from.

IBM should see some entry-level products as loss leaders.

throwaway48476

They're not offering POWER10 either because IBM closed the firmware again. Stupid move.

UncleOxidant

There were Phi cards, but they were pricey and power hungry (at the time, now current GPU cards probably meet or exceed the Phi card's power consumption) for plugging into your home PC. A few years back there was a big fire sale on Phi cards - you could pick one up for like $200. But by then nobody cared.

rbanffy

Imagine if they were sold at cost in the beginning. Also, think about having one as the only CPU rather than a card.

sheepscreek

The developers they are referring to aren’t just enthusiasts; they are also developers who were purchasing SuperMicro and Lambda PCs to develop models for their employers. Many enterprises will buy these for local development because it frees up the highly expensive enterprise-level chip for commercial use.

This is a genius move. I am more baffled by the insane form factor that can pack this much power inside a Mac Mini-esque body. For just $6000, two of these can run 400B+ models locally. That is absolutely bonkers. Imagine running ChatGPT on your desktop. You couldn’t dream about this stuff even 1 year ago. What a time to be alive!

HarHarVeryFunny

The 1 PetaFLOP spec and 200GB model capacity specs are for FP4 (4-bit floating point), which means inference not training/development. It's still be a decent personal development machine, but not for that size of model.

numba888

This looks like a bigger brother of Orin AGX, which has 64GB of RAM and runs smaller LLMs. The question will be power and performance vs 5090. We know price is 1.5x

stogot

How does it run 400B models across two? I didn’t see that in the article

tempay

> Nvidia says that two Project Digits machines can be linked together to run up to 405-billion-parameter models, if a job calls for it. Project Digits can deliver a standalone experience, as alluded to earlier, or connect to a primary Windows or Mac PC.

FuriouslyAdrift

Point to point ConnectX connection (RDMA with GPUDirect)

sliken

Not sure exactly, but they mentioned linking to together with ConnectX, which could be ethernet or IB. No idea on the speed though.

dagmx

I think the enthusiast side of things is a negligible part of the market.

That said, enthusiasts do help drive a lot of the improvements to the tech stack so if they start using this, it’ll entrench NVIDIA even more.

Karupan

I’m not so sure it’s negligible. My anecdotal experience is that since Apple Silicon chips were found to be “ok” enough to run inference with MLX, more non-technical people in my circle have asked me how they can run LLMs on their macs.

Surely a smaller market than gamers or datacenters for sure.

stuaxo

It's annoying I do LLMs for work and have a bit of an interest in them and doing stuff with GANS etc.

I have a bit of an interest in games too.

If I could get one platform for both, I could justify 2k maybe a bit more.

I can't justify that for just one half: running games on Mac, right now via Linux: no thanks.

And on the PC side, nvidia consumer cards only go to 24gb which is a bit limiting for LLMs, while being very expensive - I only play games every few months.

dagmx

I mean negligible to their bottom line. There may be tons of units bought or not, but the margin on a single datacenter system would buy tens of these.

It’s purely an ecosystem play imho. It benefits the kind of people who will go on to make potentially cool things and will stay loyal.

moralestapia

Yes, but people already had their Macs for others reasons.

No one goes to an Apple store thinking "I'll get a laptop to do AI inference".

qwertox

You could have said the same about gamers buying expensive hardware in the 00's. It's what made Nvidia big.

spaceman_2020

I keep thinking about stocks that have 100xd, and most seemed like obscure names to me as a layman. But man, Nvidia was a household name to anyone that ever played any game. And still so many of us never bothered buying the stock

Incredible fumble for me personally as an investor

Cumpiler69

There's a lot more gamers than people wanting to play with LLms at home.

gr3ml1n

AMD thought the enthusiast side of things was a negligible side of the market.

dagmx

That’s not what I’m saying. I’m saying that the people buying this aren’t going to shift their bottom line in any kind of noticeable way. They’re already sold out of their money makers. This is just an entrenchment opportunity.

epolanski

If this is gonna be widely used by ML engineers, in biopharma, etc and they land 1000$ margins at half a million sales that's half a billion in revenue, with potential to grow.

option

today’s enthusiast, grad student, hacker is tomorrow’s startup founder, CEO, CTO or 10x contributor in large tech company

Mistletoe

> tomorrow’s startup founder, CEO, CTO or 10x contributor in large tech company

Do we need more of those? We need plumbers and people that know how to build houses. We are completely full on founders and executives.

VikingCoder

If I were NVidia, I would be throwing everything I could at making entertainment experiences that need one of these to run...

I mean, this is awfully close to being "Her" in a box, right?

dagmx

I feel like a lot of people miss that Her was a dystopian future, not an ideal to hit.

Also, it’s $3000. For that you could buy subscriptions to OpenAI etc and have the dystopian partner everywhere you go.

int_19h

The real interesting stuff will happen when we get multimodal LMs that can do VR output.

computably

Yeah, it's more about preempting competitors from attracting any ecosystem development than the revenue itself.

bloomingkales

Jensen did say in recent interview, paraphrasing, “they are trying to kill my company”.

Those Macs with unified memory is a threat he is immediately addressing. Jensen is a wartime ceo from the looks of it, he’s not joking.

No wonder AMD is staying out of the high end space, since NVIDIA is going head on with Apple (and AMD is not in the business of competing with Apple).

T-A

From https://www.tomshardware.com/pc-components/cpus/amds-beastly...

The fire-breathing 120W Zen 5-powered flagship Ryzen AI Max+ 395 comes packing 16 CPU cores and 32 threads paired with 40 RDNA 3.5 (Radeon 8060S) integrated graphics cores (CUs), but perhaps more importantly, it supports up to 128GB of memory that is shared among the CPU, GPU, and XDNA 2 NPU AI engines. The memory can also be carved up to a distinct pool dedicated to the GPU only, thus delivering an astounding 256 GB/s of memory throughput that unlocks incredible performance in memory capacity-constrained AI workloads (details below). AMD says this delivers groundbreaking capabilities for thin-and-light laptops and mini workstations, particularly in AI workloads. The company also shared plenty of gaming and content creation benchmarks.

[...]

AMD also shared some rather impressive results showing a Llama 70B Nemotron LLM AI model running on both the Ryzen AI Max+ 395 with 128GB of total system RAM (32GB for the CPU, 96GB allocated to the GPU) and a desktop Nvidia GeForce RTX 4090 with 24GB of VRAM (details of the setups in the slide below). AMD says the AI Max+ 395 delivers up to 2.2X the tokens/second performance of the desktop RTX 4090 card, but the company didn’t share time-to-first-token benchmarks.

Perhaps more importantly, AMD claims to do this at an 87% lower TDP than the 450W RTX 4090, with the AI Max+ running at a mere 55W. That implies that systems built on this platform will have exceptional power efficiency metrics in AI workloads.

adrian_b

"Fire breathing" is completely inappropriate.

Strix Halo is a replacement for the high-power laptop CPUs from the HX series of Intel and AMD, together with a discrete GPU.

The thermal design power of a laptop CPU-dGPU combo is normally much higher than 120 W, which is the maximum TDP recommended for Strix Halo. The faster laptop dGPUs want more than 120 W only for themselves, not counting the CPU.

So any claims of being surprised that the TDP range for Strix Halo is 45 W to 120 W are weird, like the commenter has never seen a gaming laptop or a mobile workstation laptop.

nomel

> since NVIDIA is going head on with Apple

I think this is a race that Apple doesn't know it's part of. Apple has something that happens to work well for AI, as a side effect of having a nice GPU with lots of fast shared memory. It's not marketed for inference.

Tepix

Apple is both well aware and marketing it, as seen at https://www.apple.com/my/newsroom/2024/10/apples-new-macbook...

Quote:

"It also supports up to 128GB of unified memory, so developers can easily interact with LLMs that have nearly 200 billion parameters."

JoshTko

Which interview was this?

bloomingkales

https://fortune.com/2023/11/11/nvidia-ceo-jensen-huang-says-...

I can't find the exact Youtube video, but it's out there.

hkgjjgjfjfjfjf

You missed the Ryzen hx ai pro 395 product announcement

llm_trw

From the people I talk to the enthusiast market is nvidia 4090/3090 saturated because people want to do their fine tunes also porn on their off time. The Venn diagram of users who post about diffusion models and llms running at home is pretty much a circle.

dist-epoch

Not your weights, not your waifu

Tostino

Yeah, I really don't think the overlap is as much as you imagine. At least in /r/localllama and the discord servers I frequent, the vast majority of users are interested in one or the other primarily, and may just dabble with other things. Obviously this is just my observations...I could be totally misreading things.

numba888

> I sure wished I held some Nvidia stocks, they seem to be doing everything right in the last few years!

They propelled on unexpected LLM boom. But plan 'A' was robotics in which NVidia invested a lot for decades. I think their time is about to come, with Tesla's humanoids for 20-30k and Chinese already selling for $16k.

qwertox

This is somewhat similar to what GeForce was to gamers back in the days, but for AI enthusiasts. Sure, the price is much higher, but at least it's a completely integrated solution.

Karupan

Yep that's what I'm thinking as well. I was going to buy a 5090 mainly to play around with LLM code generation, but this is a worthy option for roughly the same price as building a new PC with a 5090.

qwertox

It has 128 GB of unified RAM. It will not be as fast as the 32 GB VRAM of the 5090, but what gamer cards have always lacked was memory.

Plus you have fast interconnects, if you want to stack them.

I was somewhat attracted by the Jetson AGX Orin with 64 GB RAM, but this one is a no-brainer for me, as long as idle power is reasonable.

trhway

>enthusiast AI dev segment

i think it isn't about enthusiast. To me it looks like Huang/NVDA is pushing further a small revolution using the opening provided by the AI wave - up until now the GPU was add-on to the general computing core onto which that computing core offloaded some computing. With AI that offloaded computing becomes de-facto the main computing and Huang/NVDA is turning tables by making the CPU is just a small add-on on the GPU, with some general computing offloaded to that CPU.

The CPU being located that "close" and with unified memory - that would stimulate development of parallelization for a lot of general computing so that it would be executed on GPU, very fast that way, instead of on the CPU. For example classic of enterprise computing - databases, the SQL ones - a lot, if not, with some work, everything, in these databases can be executed on GPU with a significant performance gain vs. CPU. Why it isn't happening today? Load/unload onto GPU eats into performance, complexity of having only some operations offloaded to GPU is very high in dev effort, etc. Streamlined development on a platform with unified memory will change it. That way Huang/NVDA may pull out rug from under the CPU-first platforms like AMD/INTC and would own both - new AI computing as well as significant share of the classic enterprise one.

tatersolid

> these databases can be executed on GPU with a significant performance gain vs. CPU

No, they can’t. GPU databases are niche products with severe limitations.

GPUs are fast at massively parallel math problems, they anren’t useful for all tasks.

trhway

>GPU databases are niche products with severe limitations.

today. For the reasons like i mentioned.

>GPUs are fast at massively parallel math problems, they anren’t useful for all tasks.

GPU are fast at massively parallel tasks. Their memory bandwidth is 10x of that of the CPU for example. So, typical database operations, massively parallel in nature like join or filter, would run about that faster.

Majority of computing can be parallelized and thus benefit from being executed on GPU (with unified memory of the practically usable for enterprise sizes like 128GB).

narrator

Nvidia releases a Linux desktop supercomputer that's better price/performance wise than anything Wintel is doing and their whole new software stack will only run on WSL2. They aren't porting to Win32. Wow, it may actually be the year of Linux on the Desktop.

sliken

Not sure how to judge better price/perf. I wouldn't expect 20 Neoverse N2 cores to do particularly well vs 16 zen5 cores. The GPU side looks promising, but they aren't mentioning memory bandwidth, configuration, spec, or performance.

Did see vague claims of "starting at $3k", max 4TB nvme, and max 128GB ram.

I'd expect AMD Strix Halo (AI Max plus 395) to be reasonably competitive.

skavi

It’s actually “10 Arm Cortex-X925 and 10 Cortex-A725” [0]. These are much newer cores and have a reasonable chance of being competitive.

[0]: https://newsroom.arm.com/blog/arm-nvidia-project-digits-high...

adrian_b

For programs dominated by iterations over arrays, these 10 Arm Cortex-X925 + 10 Cortex-A725, all 20 together, should have a throughput similar with only 10 of the 16 cores of Strix Halo (assuming that Strix Halo has full Zen 5 cores, which has not been confirmed yet).

For programs dominated by irregular integer and pointer operations, like software project compilation, 10 Arm Cortex-X925 + 10 Cortex-A725 should have a similar throughput with a 16-core Strix Halo, but which is faster would depend on cooling (i.e. a Strix Halo configured for a high power consumption will be faster).

There is not enough information to compare the performance of the GPUs from this NVIDIA Digits and from Strix Halo. However, it can be assumed that NVIDIA Digits will be better for ML/AI inference. Whether it can also be competitive for training or for graphics remains to be seen.

ksec

For context, the X925 is what used to call Cortex X5 and it is now shipping in MediaTek Dimensity 9400. It has roughly the same performance per clock as a Snapdragon 8 Elite Or roughly 5% lower performance per clock compared to Apple M3 on Geekbench 6.

Assuming they are not limited by power or heat dissipation I would say that is about as good as it gets.

The hardware is pretty damn good. I am only worried about the software.

sliken

Good catch, they called it "Grace Blackwell". Changing the CPU cores completely and calling it Grace seems weird. Maybe it was just a mistake during the keynote.

z4y5f3

NVIDIA is likely citing 1 PFlops at FP 4 sparse (they did this for GB200), so that is 128 TFlops BF16 dense, or 2/3 of what RTX 4090 is capable of. I would put the memory bandwidth at 546 GBps, using the same 512 bit LPDDR5X 8533 Mbps as Apple M4 max.

aparashk

I can't see how this will work in terms of a TDP. 2/3 of the 4090 power would be several times more power than can be effectively cooled in the physical form factor of an Apple Mini. Either there is severe downclocking happening under full throttle, or NVIDIA have come up with more low power design mojo than Apple has been able to muster for the M4 Max.

gardnr

Based on your evaluation, it sounds like it will run inference at speed similar to an M4 Max and also allow "startups" to experiment with fine tuning larger models or larger context windows.

It's the best "dev board" setup I've seen so far. It might be part of their larger commercial plan but it definitely hits the sweet spot for the home enthusiast who have been pleading for more VRAM.

bee_rider

Seems more like a workstation. So, that’s just a continuation of the last could Decades of Unix on the Workstation, right?

throw310822

They should write an AI-centered OS for it, allowing people to write easily AI heavy applications. And you'd have the Amiga of 2025.

pjmlp

Because NVidia naturally doesn't want to pay for Windows licenses.

NVidia works closely with Microsoft to develop their cards, all major features come first in DirectX, before landing on Vulkan and OpenGL as NVidia extensions, and eventually become standard after other vendors follow up with similar extensions.

CamperBob2

Where does it say they won't be supporting Win32?

narrator

Here he says that in order for the cloud and the PC to be compatible, he's going to only support WSL2, the Windows subsystem for Linux which is a Linux API on top of Windows.

Here's a link to the part of the keynote where he says this:

https://youtu.be/MC7L_EWylb0?t=7259

stonogo

"Linux API on top of Windows" is an interesting way to describe a virtual machine.

sliken

The keynote mentioned that it could be used as a Linux workstation.

diggan

> their whole new software stack will only run on WSL2. They aren't porting to Win32

Wait, what do you mean exactly? Isn't WSL2 just a VM essentially? Don't you mean it'll run on Linux (which you also can run on WSL2)?

Or will it really only work with WSL2? I was excited as I thought it was just a Linux Workstation, but if WSL2 gets involved/is required somehow, then I need to run the other direction.

awestroke

No, nobody will run windows on this. It's meant to run NVIDIAs own flavor of Ubuntu with a patched kernel

hx8

Yes, WSL2 is essentially a highly integrated VM. I think it's a bit of a joke to call Ubuntu WSL2, because it seems like most Ubuntu installs are either VMs for Windows PCs or on Azure Cloud.

rvz

> Wow, it may actually be the year of Linux on the Desktop.

Yeah starting at $3,000. Surely a cheap desktop computer to buy for someone who just wants to surf the web and send email /s.

There is a reason why it is for "enthusiasts" and not for the general wider consumer or typical PC buyer.

Topfi

I see the most direct competitor in the Mac Studio, though of course we will have to wait for reviews to gauge how fair that comparison is. The Studio does have a fairly large niche as a solid workstation, though, so I could see this being successful.

For general desktop use, as you described, nearly any piece of modern hardware, from a RasPI, to most modern smartphones with a dock, could realistically serve most people well.

The thing is, you need to serve both, low-end use cases like browsing, and high-end dev work via workstations, because even for the "average user", there is often one specific program on which they need to rely and which has limited support outside the OS they have grown up with. Course, there will be some programs like Desktop Microsoft Office which will never be ported, but still, Digitis could open the doors to some devs working natively on Linux.

A solid, compact, high-performance, yet low power workstation with a fully supported Linux desktop out of the box could bridge that gap, similar to how I have seen some developers adopt macOS over Linux and Windows since the release of the Studio and Max MacBooks.

Again, we have yet to see independent testing, but I would be surprised if anything of this size, simplicity, efficiency and performance was possible in any hardware configuration currently on the market.

sliken

I did want a M2 max studio, ended up with a 12 core Zen 4 + radeon 7800 XT for about half the money.

A Nvidia Project Digit/GB10 for $3k with 128GB ram does sound tempting. Especially since it's very likely to have standard NVMe storage that I can expand or replace as needed, unlike the Apple solution. Decent linux support is welcome as well.

Here's hoping, if not I can fall back to a 128GB ram AMD Strix Halo/395 AI Max plus. CPU perf should be in the same ballpark, but not likely to come anywhere close on GPU performance, but still likely to have decent tokens/sec for casual home tinkering.

yjftsjthsd-h

> Surely a cheap desktop computer to buy for someone who just wants to surf the web and send email /s.

That end of the market is occupied by Chromebooks... AKA a different GNU/Linux.

fooker

The typical PC buyer is an enthusiast now.

immibis

Never underestimate the open source world's power to create a crappy desktop experience.

tokai

You're like 15 years out of date.

immibis

No, it's still relevant in current year. Progress has gone backwards on some projects/distros.

derbaum

I'm a bit surprised by the amount of comments comparing the cost to (often cheap) cloud solutions. Nvidia's value proposition is completely different in my opinion. Say I have a startup in the EU that handles personal data or some company secrets and wants to use an LLM to analyse it (like using RAG). Having that data never leave your basement sure can be worth more than $3000 if performance is not a bottleneck.

lolinder

Heck, I'm willing to pay $3000 for one of these to get a good model that runs my requests locally. It's probably just my stupid ape brain trying to do finance, but I'm infinitely more likely to run dumb experiments with LLMs on hardware I own than I am while paying per token (to the point where I currently spend way more time with small local llamas than with Claude), and even though I don't do anything sensitive I'm still leery of shipping all my data to one of these companies.

This isn't competing with cloud, it's competing with Mac Minis and beefy GPUs. And $3000 is a very attractive price point in that market.

logankeenan

Have you been to the localLlama subreddit? It’s a great resource for running models locally. It’s what got me started.

https://www.reddit.com/r/LocalLLaMA/

lolinder

Yep! I don't spend much time there because I got pretty comfortable with llama before that subreddit really got started, but it's definitely turned up some helpful answers about parameter tuning from time to time!

ynniv

I'm pretty frugal, but my first thought is to get two to run 405B models. Building out 128GB of VRAM isn't easy, and will likely cost twice this.

rsanek

You can get a M4 Max MBP with 128GB for $1k less than two of these single-use devices.

sensesp

100% I see many SMEs not willing to send their data to some cloud black box.

jckahn

Exactly this. I would happily give $3k to NVIDIA to avoid giving 1 cent to OpenAI/Anthropic.

originalvichy

Even for established companies this is great. A tech company can have a few of these locally hosted and users can poll the company LLM with sensitive data.

diggan

The price seems relatively competitive even compared to other local alternatives like "build your own PC". I'd definitely buy one of this (or even two if it works really well) for developing/training/using models that currently run on cobbled together hardware I got left after upgrading my desktop.

627467

> Having that data never leave your basement sure can be worth more than $3000 if performance is not a bottleneck

I get what you're saying, but there are also regulations (and your own business interest) that expects data redundancy/protection which keeping everything on-site doesnt seem to cover

btbuildem

Yeah that's cheaper than many prosumer GPUs on the market right now

a_bonobo

There's a market not described here: bioinformatics.

The owner of the market, Illumina, already ships their own bespoke hardware chips in servers called DRAGEN for faster analysis of thousands of genomes. Their main market for this product is in personalised medicine, as genome sequencing in humans is becoming common.

Other companies like Oxford Nanopore use on-board GPUs to call bases (i.e., from raw electric signal coming off the sequencer to A, T, G, C) but it's not working as well as it could due to size and power constraints. I feel like this could be a huge game changer for someone like ONT, especially with cooler stuff like adaptive sequencing.

Other avenues of bioinformatics, such as most day-to-day analysis software, is still very CPU and RAM heavy.

evandijk70

This is, at least for now, a relatively small market. Illumina acquired the company manufacturing these chips for $100M. Analysis of a genome in the cloud generally costs below $10 on general purpose hardware.

It is of course possible that these chips enable analyses that are currently not possible/prohibited by cost, but at least for now, this will not be the limiting factor for genomics, but cost of sequencing (which is currently $400-500 per genome)

mocheeze

Doesn't seem like Illumina actually cares much about security: https://arstechnica.com/security/2025/01/widely-used-dna-seq...

mycall

The bigger picture is that OpenAI o3/o4.. plus specialized models will blow open the doors to genome tagging and discovery, but that is still 1 to 3 years away for ASI to kick in.

nzach

While I kinda agree with you, I don't think we will ever find a meaningful way to throw genome sequencing data at LLMs. It's simple too much data.

I've worked in a project some years ago where we were using data from genome sequencing of a bacteria. Every sequenced sample was around 3GB of data and sample size was pretty small with only about 100 samples to study.

I think the real revolution will happen because code generation through LLMs will allow biologists to write 'good enough' code to transform, process and analyze data. Today to do any meaningful work with genome data you need a pretty competent bioinformatician, and they are a rare breed. Removing this bottleneck is what will allow us to move faster in this field.

amelie-iska

http://www.chat-protein.com/

amelie-iska

Just use a DNA/genomic language model like gLM2 or Evo and cross-attention that with o3 and you’re golden imo.

newsclues

Is this for research labs, health clinics, or peoples homes?

a_bonobo

ONT sells its smallest MinION to regular people, too. But Illumina's and ONT's main market is universities, followed by large hospitals

mfld

Small nitpick: Illumia is the owner of the sequencing market. But not really of the bioinformatics market.

neom

In case you're curious, I googled. It runs this thing called "DGX OS":

"DGX OS 6 Features The following are the key features of DGX OS Release 6:

Based on Ubuntu 22.04 with the latest long-term Linux kernel version 5.15 for the recent hardware and security updates and updates to software packages, such as Python and GCC.

Includes the NVIDIA-optimized Linux kernel, which supports GPU Direct Storage (GDS) without additional patches.

Provides access to all NVIDIA GPU driver branches and CUDA toolkit versions.

Uses the Ubuntu OFED by default with the option to install NVIDIA OFED for additional features.

Supports Secure Boot (requires Ubuntu OFED).

Supports DGX H100/H200."

AtlasBarfed

Nvidia optimize meaning non-public patches, a non-upgradable operating system like what happens if you upgrade with a binary blob Nvidia driver?

wmf

You can upgrade to a newer release of DGX OS.

yoyohello13

I wonder what kind of spyware is loaded onto DGX OS. Oh, sorry I mean telemetry.

ZeroTalent

Cybersecurity analysts check and monitor these things daily, and they are pretty easy to catch. Likely nothing malicious, as history shows.

thunkshift1

Correct, highly concerning.. this is totally not the case with existing os’s and products

treprinum

Nvidia just did what Intel/AMD should have done to threaten CUDA ecosystem - release a "cheap" 128GB local inference appliance/GPU. Well done Nvidia, and it looks bleak for any AI Intel/AMD efforts in the future.

mft_

I think you nailed it. Any basic SWOT analysis of NVidia’s position would surely have to consider something like this from a competitor - either Apple, who is already nibbling around the edges of this space, or AMD/Intel who could/should? be.

It’s obviously not guaranteed to go this route, but an LLM (or similar) on every desk and in every home is a plausible vision of the future.

iszomer

Nvidia also brought Mediatek into the spotlight..

null

[deleted]

mrtksn

Okay, so this is not a peripheral that you connect to your computer to run specialized tasks, this is a full computer running Linux.

It's a garden hermit. Imagine a future where everyone has one of those(not exactly this version but some future version), it lives with you it learns with you and unlike the cloud based SaaS AI you can teach it things immediately and diverge from the average to your advantage.

Topfi

I'd love to own one, but doubt this will go beyond a very specific niche. Despite there being advantages, very few still operate their own Plex server over subscriptions to streaming services, and on the local front, I feel that the progress of hardware, alongside findings that smaller models can handle a variety of tasks quite well, will mean a high performance, local workstation of this type will have niche appeal at most.

mrtksn

I have this feeling that at some point it will be very advantageous to have personal AI because when you use something that everyone can use the output of this something becomes very low value.

Maybe it will still make sense to have your personal AI in some data center, but on the other hand, there is the trend of governments and mega corps regulating what you can do with your computer. Try going out of the basics, try to do something fun and edge case - it is very likely that your general availability AI will refuse to help you.

when it is your own property, you get the chance to overcome restrictions and develop the thing beyond the average.

As a result, having something that can do things that no other else can do and not having restrictions on what you can do with this thing can become the ultimate superpower.

noduerme

"garden hermit" is a very interesting and evocative phrase. Where is that from?

mrtksn

It's a real thing: https://en.wikipedia.org/wiki/Garden_hermit

In the past, in Europe, some wealthy people used to look after of a scholar living on their premises so they can ask them questions etc.

noduerme

aha, this is really something. I just got around to watching "Furiosa" last night. So something like having a personal "history man" (although, my take on the whole Mad Max series is that it's just bottled up fear-porn about white settlers going uncivilized and becoming "tribal" - a colonial horror tale, "The Heart of Darkness" with motorcycles - common anywhere a relatively small group spread themselves out on a lot of ill-gotten land, did some nasty deeds and lost touch with the mothership).

In the Furiosa context, it's a bit like a medicine man or shaman, then. A private, unreliable source of verbal hand me downs, whose main utility is to make elites feel like they have access to knowledge without needing to acquire it for themselves or question its veracity.

We really are entering a new dark age.

Mistletoe

This is so strange, my girlfriend was just telling me about those yesterday. The word “ornamental hermit” fills me with about as much disgust as I can experience.

> Later, suggestions of hermits were replaced with actual hermits – men hired for the sole purpose of inhabiting a small structure and functioning as any other garden ornament.

rsynnott

> The one at Painshill, hired by The Hon. Charles Hamilton for a seven-year term under strict conditions, lasted three weeks until he was sacked after being discovered in a local pub

I mean, fair. Very bad hermit-ing.

(Terry Pratchett has a fun parody of this in one of the Discworld books; the garden hermit gets two weeks' holidays a year, which he spends in a large city.)

gnatman

>> The IBM Roadrunner was the first supercomputer to reach one petaflop (1 quadrillion floating point operations per second, or FLOPS) on May 25, 2008.

$100M, 2.35MW, 6000 ft^2

>>Designed for AI researchers, data scientists, and students, Project Digits packs Nvidia’s new GB10 Grace Blackwell Superchip, which delivers up to a petaflop of computing performance for prototyping, fine-tuning, and running AI models.

$3000, 1kW, 0.5 ft^2

DannyBee

Digits is petaflops of FP4, roadrunner is petaflops of FP32. So at least a factor of 8 difference, but in practice much more. (IE I strongly doubt digits can do 1/8th petaflop of FP32)

Beyond that, the factors seem reasonable for 2 decades?

dotancohen

Why even use a floating point if you have only 4 bits? Models with INT8 features are not unheard of.

cjbgkagh

1 sign and 3 exponent bits. AFAIK at the small number of bits it’s basically a teeny tiny look up table so you can precompute the table to be whatever math you want. Having an exponent instead of mantissa just means that the values that can be expressed are not linearly separated.

stassats

> roadrunner is petaflops of FP32

Isn't it actually FP64?

DannyBee

So i can find sources that claim both ;) I wasn't sure what to believe, and didn't spend more than 5 minutes digging for the real results, so i went with the conservative one.

ryao

This looks like a successor to the Nvidia Jetson AGX Orin 64GB Developer Kit:

https://www.okdo.com/wp-content/uploads/2023/03/jetson-agx-o...

I wonder what the specifications are in terms of memory bandwidth and computational capability.

kcb

Hopefully, the OS support isn't as awful as the Jetson platforms usually are. Unless they change, you'll get 1 or 2 major kernel updates ever and have to do bizarre stuff like install a 6 year old Ubuntu on your x86 PC to run the utility to flash the OS.

ryao

The community likely will make instructions for installing mainstream Linux distributions on it.

kcb

Doesn't really help though if it requires an nvidia kernel.

zamadatix

The Jetson Orin Dev Kit is squarely aimed at being a dev kit for those using the Jetson module in production edge compute (robotic vision and the like). The only reason it's so well known in tech circles is "SBC syndrome" where people get excited about what they think they could do with it and then 95% end up in a drawer a year later because it what it's actually good at is unrelated to why they bought it.

This is more accurately a descendant of the HPC variants like the article talks about - intentionally meant to actually be a useful entry level for those wanting to do or run general AI work better than a random PC would have anyways.

moffkalast

The AGX Orin was only 64GB of LPDDR5 and priced at $5k so this does seem like a bargain in comparison with 128GB of presumably HBM. But Nvidia never lowers their prices, so there's a caveat somewhere.

fulafel

The memory is LPDDR accordning to the specs graphic on the NV product page: https://www.nvidia.com/en-us/project-digits/

Anyone willing to guess how wide?

moffkalast

I've seen some claims that it can do 512 GB/s on Reddit (not sure where they got that from), which would imply a ~300 bit bus with LPDDR5X depending on the frequency.

tim333

I've followed progress since Moravec's "When will computer hardware match the human brain?" since that came out in 1997. It starts:

>This paper describes how the performance of AI machines tends to improve at the same pace that AI researchers get access to faster hardware. The processing power and memory capacity necessary to match general intellectual performance of the human brain are estimated. Based on extrapolation of past trends and on examination of technologies under development, it is predicted that the required hardware will be available in cheap machines in the 2020s.

and this is about the first personal unit that seems well ahead of his proposed specs. (He estimated 0.1 petaflops. The nvidia thing is "1 petaflop of AI performance at FP4 precision").

(paper https://jetpress.org/volume1/moravec.pdf)

modeless

Finally a real ARM workstation from Nvidia! This will be much faster than Apple's offerings for AI work. And at $3000 it is much cheaper than any Mac with 128 GB RAM.

sliken

On the CPU size the Neoverse N2 doesn't compete particularly well with apple's M4, or the Zen5 for the matter.

Bit bit hard to tell what's on offer on the GPU side, I wouldn't be surprised if it was RTX 4070 to 5070 in that range.

If the price/perf is high enough $3k wouldn't be a bad deal, I suspect a Strix Halo (better CPU cores, 256GB/sec memory interface, likely slower GPU cores) will be better price/perf, same max ram for unified memory, and cheaper.

skavi

It’s actually “10 Arm Cortex-X925 and 10 Cortex-A725” [0]. These are much newer cores and have a reasonable chance of being competitive.

[0]: https://newsroom.arm.com/blog/arm-nvidia-project-digits-high...

modeless

AI work happens predominantly on the GPU, not the CPU. This GPU with CUDA will run rings around M4 with MLX. And with much more RAM than you can get in a Mac for $3k.

A lot of people have been justifying their Mac Studio or Mac Pro purchases by the potential for running large AI models locally. Project Digits will be much better at that for cheaper. Maybe it won't run compile Chromium as fast, but that's not what it's for.

sliken

The quotes I've seen mention the maximum config (128GB ram and 4TB of storage) and the minimum price. Nothing saying $3k for 128GB ram and 4TB of storage. I hope I'm wrong, but I'm betting the max price is at least twice the minimum price.

gardnr

Remember: inference is memory bound.

https://www.databricks.com/blog/llm-inference-performance-en...

HN

Nvidia's Project Digits is a 'personal AI supercomputer'

Nvidia's Project Digits is a 'personal AI supercomputer'