Skip to content(if available)orjump to list(if available)

1.5 TB of VRAM on Mac Studio – RDMA over Thunderbolt 5

chis

I wonder what motivates apple to release features like RDMA which are purely useful for server clusters, while ignoring basic qol stuff like remote management or rack mount hardware. It’s difficult to see it as a cohesive strategy.

Makes one wonder what apple uses for their own servers. I guess maybe they have some internal M-series server product they just haven’t bothered to release to the public, and features like this are downstream of that?

vsgherzi

last I heard for the private compute features they were racking and stacking m2 mac pros

xienze

> rack mount hardware

I guess they prefer that third parties deal with that. There’s rack mount shelves for Mac Minis and Studios.

behnamoh

My expectations from M5 Max/Ultra devices:

- Something like DGX QSFP link (200Gb/s, 400Gb/s) instead of TB5. Otherwise, the economies of this RDMA setup, while impressive, don't make sense.

- Neural accelerators to get prompt prefill time down. I don't expect RTX 6000 Pro speeds, but something like 3090/4090 would be nice.

- 1TB of unified memory in the maxed out version of Mac Studio. I'd rather invest in more RAM than more devices (centralized will always be faster than distributed).

- +1TB/s bandwidth. For the past 3 generations, the speed has been 800GB/s...

- The ability to overclock the system? I know it probably will never happen, but my expectation of Mac Studio is not the same as a laptop, and I'm TOTALLY okay with it consuming +600W energy. Currently it's capped at ~250W.

Also, as the OP noted, this setup can support up to 4 Mac devices because each Mac must be connected to every other Mac!! All the more reason for Apple to invest in something like QSFP.

Dylan16807

> +1TB/s bandwidth. For the past 3 generations, the speed has been 800GB/s...

M4 already hit the necessary speed per channel, and M5 is well above it. If they actually release an Ultra that much bandwidth is guaranteed on the full version. Even the smaller version with 25% fewer memory channels will be pretty close.

We already know Max won't get anywhere near 1TB/s since Max is half of an Ultra.

zozbot234

> Neural accelerators to get prompt prefill time down.

Apple Neural Engine is a thing already, with support for multiply-accumulate on INT8 and FP16. AI inference frameworks need to add support for it.

> this setup can support up to 4 Mac devices because each Mac must be connected to every other Mac!!

Do you really need a fully connected mesh? Doesn't Thunderbolt just show up as a network connection that RDMA is ran on top of?

fooblaster

Might be helpful if they actually provided a programming model for ANE that isn't onnx. ANE not having a native development model just means software support will not be great.

angoragoats

> Also, as the OP noted, this setup can support up to 4 Mac devices because each Mac must be connected to every other Mac!! All the more reason for Apple to invest in something like QSFP.

This isn’t any different with QSFP unless you’re suggesting that one adds a 200GbE switch to the mix, which:

* Adds thousands of dollars of cost,

* Adds 150W or more of power usage and the accompanying loud fan noise that comes with that,

* And perhaps most importantly adds measurable latency to a networking stack that is already higher latency than the RDMA approach used by the TB5 setup in the OP.

burnt-resistor

Apple has always sucked at properly embracing properly robust tech for high-end gear for markets outside of individual prosumer or creatives. When Xserves existed, they used commodity IDE drives without HA or replaceable PSUs that couldn't compete with contemporary enterprise servers (HP-Compaq/Dell/IBM/Fujitsu). Xserve RAID interconnection half-heartedly used fiber channel but couldn't touch a NetApp or EMC SAN/filer. I'm disappointed Apple has a persistent blindspot preventing them from succeeding in data center-quality gear category when they could've had virtualized servers, networking, and storage, things that would eventually find their way into my home lab after 5-7 years.

tylerflick

> TOTALLY okay with it consuming +600W energy

The 2019 i9 Macbook Pro has entered the chat.

mmorse1217

Hey Jeff, wherever you are: this is awesome work! I’ve wanted to try something like this for a while and was very excited for the RDMA over thunderbolt news.

But I mostly want to say thanks for everything you do. Your good vibes are deeply appreciated and you are an inspiration.

rahimnathwani

The largest nodes in his cluster each have 512GB RAM. DeepSeek V3.1 is a 671B parameter model whose weights take up 700GB RAM: https://huggingface.co/deepseek-ai/DeepSeek-V3.1

I would have expected that going from one node (which can't hold the weights in RAM) to two nodes would have increased inference speed by more than the measured 32% (21.1t/s -> 27.8t/s).

With no constraint on RAM (4 nodes) the inference speed is less than 50% faster than with only 512GB.

Am I missing something?

zozbot234

Weights are read-only data so they can just be memory mapped and reside on SSD (only a small fraction will be needed in VRAM at any given time), the real constraint is activations. MoE architecture should help quite a bit here.

elorant

You only get 80Gbps network bandwidth. There's your bottleneck right there. Infiniband in comparison can give you up to x10 times that.

zeusk

the TB5 link (RDMA) is much slower than direct access to system memory

andy99

Very cool, I’m probably thinking too much but why are they seemingly hyping this now (I’ve seen a bunch of this recently) with no M5 Max/Ultra machines in sight. Is it because their release is imminent (I have heard Q1 2026) or is it to try and stretch out demand for M4 Max / M3 Ultra. I plan to buy one (not four) but would feel like I’m buying something that’s going to be immediately out of date if I don’t wait for the M5.

GeekyBear

I imagine that they want to give developers time to get their RDMA support stabilized, so third party software will be ready to take advantage of RDMA when the M5 Ultra lands.

I definitely would not be buying an M3 Ultra right now on my own dime.

scsilver

Very cool. The real breakthrough here is the sub-10µs latency of RDMA over Thunderbolt 5, which finally makes Tensor Parallelism viable on consumer hardware. Previous interconnects restricted us to slower pipeline parallelism. It’s impressive to see Apple bringing this capability—architecturally similar to Nvidia’s RoCEv2 and Google’s ICI—down to the desktop. This effectively creates a unified memory pool, which is critical for the massive KV caches required by the next generation of reasoning models.

lvl155

Seriously, Jeff has the best job. Him and STH Patrick.

delaminator

> Working with some of these huge models, I can see how AI has some use, especially if it's under my own local control. But it'll be a long time before I put much trust in what I get out of it—I treat it like I do Wikipedia. Maybe good for a jumping-off point, but don't ever let AI replace your ability to think critically!

It is a little sad that they gave someone an uber machine and this was the best he could come up with.

Question answering is interesting but not the most interesting thing one can do, especially with a home rig.

The realm of the possible

Video generation: CogVideoX at full resolution, longer clips

Mochi or Hunyuan Video with extended duration

Image generation at scale:

FLUX batch generation — 50 images simultaneously

Fine-tuning:

Actually train something — show LoRA on a 400B model, or full fine-tuning on a 70B

but I suppose "You have it for the weekend" means chatbot go brrrrr and snark

theshrike79

Yea, I don't understand why people use LLMs for "facts". You can get them from Wikipedia or a book.

Use them for something creative, write a short story on spec, generate images.

Or the best option: give it tools and let it actually DO something like "read my message history with my wife, find top 5 gift ideas she might have hinted at and search for options to purchase them" - perfect for a local model, there's no way in hell I'd feed my messages to a public LLM, but the one sitting next to me that I can turn off the second it twitches the wrong way? - sure.

newsclues

https://m.youtube.com/watch?v=4l4UWZGxvoc

Seems like the ecosystem is rapidly evolving