Skip to content(if available)orjump to list(if available)

Apple M3 Ultra

Apple M3 Ultra

784 comments

·March 5, 2025

cxie

512GB of unified memory is truly breaking new ground. I was wondering when Apple would overcome memory constraints, and now we're seeing a half-terabyte level of unified memory. This is incredibly practical for running large AI models locally ("600 billion parameters"), and Apple's approach of integrating this much efficient memory on a single chip is fascinating compared to NVIDIA's solutions. I'm curious about how this design of "fusing" two M3 Max chips performs in terms of heat dissipation and power consumption though

FloatArtifact

They didn't increase the memory bandwidth. You can get the same memory bandwidth, which is available on the M2 Studio. Yes, yes, of course you can get 512 gigabytes of uRAM for 10 grand.

The the question is if a llm will run with usable performance at that scale? The point is there's diminishing returns despite having enough uRAM with the same amount of memory bandwidth even with increased processing speed of the new chip for AI.

So there must be a min-max performance ratio between memory bandwidth and the size of the memory pool in relation to the processing power.

lhl

Since no one specifically answered your question yet, yes, you should be able to get usable performance. A Q4_K_M GGUF of DeepSeek-R1 is 404GB. This is a 671B MoE that "only" has 37B activations per pass. You'd probably expect in the ballpark of 20-30 tok/s (depends on how much actually MBW can be utilized) for text generation.

From my napkin math, the M3 Ultra TFLOPs is still relatively low (around 43 FP16 TFLOPs?), but it should be more than enough to handle bs=1 token generation (should be way <10 FLOPs/byte for inference). Now as far is its prefill/prompt processing speed... well, that's another matter.

null

[deleted]

lynguist

I actually think it’s not a coincidence and they specifically built this M3 Ultra for DeepSeek R1 4-bit. They also highlight in their press release that they tested it with 600B class LLMs (DeepSeek R1 without referring to it by name). And they specifically did not stop at 256 GB RAM to make this happen. Maybe I’m reading too much into it.

drited

I would be curious about context window size that would be expected when generating ballpark 20 to 20 tokens per second using Deepseek-R1 Q4 on this hardware?

valine

Probably helps that models like deepseek are mixture of expert. Having all weights in VRAM means you don’t have to unlod/reload. Memory bandwidth usage should be limited to the 37B active parameters.

FloatArtifact

> Probably helps that models like deepseek are mixture of expert. Having all weights in VRAM means you don’t have to unlod/reload. Memory bandwidth usage should be limited to the 37B active parameters.

"Memory bandwidth usage should be limited to the 37B active parameters."

Can someone do a deep dive above quote. I understand having the entire model loaded into RAM helps with response times. However, I don't quite understand the memory bandwidth to active parameters.

Context window?

How much the model can actively be processed despite being fully loaded into memory based on memory bandwidth?

diggan

> The the question is if a llm will run with usable performance at that scale?

This is the big question to have answered. Many people claim Apple can now reliably be used as a ML workstation, but from the numbers I've seen from benchmarks, the models may fit in memory, but the performance for tok/sec is so slow to not feel worth it, compared to running it on NVIDIA hardware.

Although it be expensive as hell to get 512GB of VRAM with NVIDIA today, maybe moves like this from Apple could push down the prices at least a little bit.

johnmaguire

It is much slower than nVidia, but for a lot of personal-use LLM scenarios, it's very workable. And it doesn't need to be anywhere near as fast considering it's really the only viable (affordable) option for private, local inference, besides building a server like this, which is no faster: https://news.ycombinator.com/item?id=42897205

hangonhn

Do we know if is it slower because of hardware is not as well suited for the task or is it mostly a software issue -- the code hasn't been optimized to run on Apple Silicon?

deepGem

Any idea what the sRAM to uRAM ratio is on these new GPUs ? If they have meaningfully higher sRAM than the Hopper GPUs, it could lead to meaningful speedups in large model training.

If they didn't increase the memory bandwidth, then 512GB will enable longer context lengths and that's about it right? No speedups

For any speedups You may need some new variant of FlashAttention3 or something along similar lines to be purpose built for Apple GPUs.

astrange

I don't know what you mean by s and u, but there is only one kind of memory in the machine, that's what unified memory means.

null

[deleted]

cxie

Guess what? I'm on a mission to completely max out all 512GB of mem...maybe by running DeepSeek on it. Pure greed!

swivelmaster

You could always just open a few Chrome tabs…

bob1029

> The question is if a llm will run with usable performance at that scale?

For the self-attention mechanism, memory bandwidth requirements scale ~quadratically with the sequence length.

kridsdale1

Someone has got to be working on a better method than that. Hundreds of billions are at stake.

TheRealPomax

Yeah they did? The M4 has a max memory bandwidth of 546GBps, the M3 Ultra bumps that up to a max of 819GBps.

(and the 512GB version is $4,000 more rather than $10,000 - that's still worth mocking, but it's nowhere near as much)

okanesen

Not that dramatic of an increase actually - the M2 Max already had 400GB/s and M2 Ultra 800GB/s memory bandwidth, so the M3 Ultra's 819GB/s is just a modest bump. Though the M4's additional 146GB/s is indeed a more noticeable improvement.

sudoshred

Agree. Finally I can have several hundred browser tabs open simultaneously with no performance degradation.

protocolture

Well at least 20

resters

The same thing could be designed with greater memory bandwidth, and so it's just a matter of time (for NVIDIA) until Apple decides to compete.

PeterStuer

Is this on chip memory? From the 800GB/s I would guess more likely a 512bit bus (8 channel) to DDR5 modules. Doing it on a quad channel would just about be possible, but really be pushing the envelope. Still a nice thing.

As for practicality, which mainstream applications would benefit from this much memory paired with a nice but relative mid compute? At this price-point (14K for a full specced system), would you prefer it over e.g. a couple of NVIDIA project DIGITS (assuming that arrives on time and for around the announced the 3K price-point)?

zitterbewegung

NVIDIA project DIGITS has 128 GB LPDDR5x coherent unified system memory at a 273 Gb/s memory bus speed.

bangaladore

It would be 273 GB/s (gigabytes, not gigabits). But in reality we don't know the bandwidth. Some ex employee said 500 GB/s.

You're source is a reddit post in which they try to match the size to existing chips, without realizing that its very likely that NVIDIA is using custom memory here produced by Micron. Like Apple uses custom memory chips.

samstave

"unified memory"

funny that people think this is so new, when CRAY had Global Heap eons ago...

webworker

The real hardware needed for artificial intelligence wasn't NVIDIA, it was a CRAY XMP from 1982 all along

samstave

WHen I was with Mirantis, I flew to Austin TX to meet a client in a non-descript multi-tenant office building...

we walked in and getting our bearings, we come upon CRAY office. WTF?!

I tried the doors, locked - and it was clearly empty... but damn did I want to steal their office door signage.

hot_gril

It's new for mainstream PCs to have it.

ddtaylor

Why did it take so long for us to get here?

RachelF

Some possible groups of reasons: 1. Until recently RAM amount was something the end user liked to configure, so little market demand. 2. Technically, building such a large system on a chip or collection of chiplets was not possible. 3. RAM speed wasn't a bottleneck for most tasks, it was IO or CPU. LLMs changed this.

wmf

Laptops have had unified memory for ten years or more. For desktops very few apps benefit from unified memory.

baby_souffle

Just a guess, but fabricating this can't be easy. Yield is probably higher if you have less memory per chip.

tempest_

Nvidia has had the Grace Hoppers for a while now. Is this not like that?

ykl

This is cheap compared to GB200, which has a street price of >$70k for just the chip alone if you can even get one. Also GB200 technically has only 192GB per GPU and access to more than that happens over NVLink/RDMA, whereas here it’s just one big flat pool of unified memory without any tiered access topology.

rbanffy

We finally encountered the situation where an Apple computer is cheaper than its competition ;-)

All joking aside, I don't think Apples are that expensive compared to similar high-end gear. I don't think there is any other compact desktop computer with half a terabyte of RAM accessible to the GPU.

null

[deleted]

TheRealPomax

I think the other big thing is that the base model finally starts at a normal amount of memory for a production machine. You can't get less than 96GB. Although an extra $4000 for the 512GB model seems Tim Apple levels of ridiculous. There is absolutely no way that the different costs anywhere near that much at the fab.

And the storage solution still makes no sense of course, a machine like this should start at 4TB for $0 extra, 8TB for $500 more, and 16TB for $1000 more. Not start at a useless 1TB, with the 8TB version costing an extra $2400 and 16TB a truly idiotic $4600. If Sabrent can make and sell 8TB m.2 NVMe drives for $1000, SoC storage should set you back half that, not over double that.

jjtheblunt

> There is absolutely no way that the different costs anywhere near that much at the fab.

price premium probably, but chip lithography errors (thus, yields) at the huge memory density might be partially driving up the cost for huge memory.

TheRealPomax

It's Apple, price premium is a given.

InTheArena

Whoa. M3 instead of M4. I wonder if this was basically binning, but I thought that I had read somewhere that the interposer that enabled this for the M1 chips where not available.

That Said, 512GB of unified ram with access to the NPU is absolutely a game changer. My guess is that Apple developed this chip for their internal AI efforts, and are now at the point where they are releasing it publicly for others to use. They really need a 2U rack form for this though.

This hardware is really being held back by the operating system at this point.

exabrial

If Apple supported Linux (headless) natively, and we could rack m4 pros, I absolutely would use them in our Colo.

The CPUs have zero competition in terms of speed, memory bandwidth. Still blown away no other company has been able to produce Arm server chips that can compete.

hedora

The last I checked, AMD was outperforming Apple perf/dollar on the high end, though they were close on perf/watt for the TDPs where their parts overlapped.

I’d be curious to know if this changes that. It’d take a lot more than doubling cores to take out the very high power AMD parts, but this might squeeze them a bit.

Interestingly, AMD has also been investing heavily in unified RAM. I wonder if they have / plan an SoC that competes 1:1 with this. (Most of the parts I’m referring to are set up for discrete graphics.)

aurareturn

The M4 Pro is 56% faster in ST performance against AMD’s new Strix Halo while being 3.6x more efficient.

Source: https://www.notebookcheck.net/AMD-Ryzen-AI-Max-395-Analysis-...

Cinebench 2024 results.

nick_

Same. I'm not sure what to make of the various claims. I personally defer to this table in general: https://www.cpubenchmark.net/power_performance.html.

I'm not sure how those benchmarks translate to common real world use cases.

PaulHoule

If I read this right, the r8g.48xlarge at AMZN [1] has 192 cores and 1536GB which exceeds the M3 Ultra in some metrics.

It reminds me of the 1990s when my old school was using Sun machines based on the 68k series and later SPARC and we were blown away with the toaster-sized HP PA RISC machine that was used for student work for all the CS classes.

Then Linux came out and it was clear the 386 trashed them all in terms of value and as we got the 486 and 586 and further generations, the Intel architecture trashed them in every respect.

The story then was that Intel was making more parts than anybody else so nobody else could afford to keep up the investment.

The same is happening with parts for phones and TSMC's manufacturing dominance -- and today with chiplets you can build up things like the M3 Ultra out of smaller parts.

[1] https://aws.amazon.com/ec2/instance-types/r8g/

hedora

In fairness, the sun and dec boxes I used back then (up to about 1999) could hold their own against intel machines.

Then, one day, we built a 5 machine amd athlon xp linux cluster for $2000 ($400/machine) that beat all the unix and windows server hardware by at least 10x on $/perf.

It’s nice that we have more than one viable cpu vendor these days, though it seems like there’s only one viable fab company.

nsteel

It seems Graviton 4 CPUs have 12-channels of DDR5-5600 i.e 540GB/s main memory bandwidth for the CPU to use. M3 Ultra has 64-channels of LPDDR5-6400 i.e. ~800GB/s of memory bandwidth for the CPU or the GPU to use. So the M3 Ultra has way fewer (CPU) cores, but way more memory bandwidth. Depends what you're doing.

astrange

The interesting difference between x86 and ARM is security, not performance, btw.

rbanffy

> The CPUs have zero competition in terms of speed, memory bandwidth.

Maybe not at the same power consumption, but I'm sure mid-range Xeons and EPYCs mop the floor with the M3 Ultra in CPU performance. What the M3 Ultra has that nobody else comes close is a decent GPU near a pool of half a terabyte of RAM.

hoppp

What about serviceability? These come with soldered in ssd? That would be an issue for server use, Its too expensive to throw it away all for a broken ssd.

galad87

No, the SSD isn't soldered, it has got one or two removable modules: https://everymac.com/systems/apple/mac-studio/mac-studio-faq...

gjsman-1000

Nah, in many businesses, everything is on a schedule. For desktop computers, a common cycle is 4 years. For servers, maybe a little longer, but not by much. After that date arrives, it’s liquidate everything and rebuild.

Having things consistently work is much cheaper than down days caused by your ancient equipment. Apple’s SSDs will make it to 5 years no problem - and more likely, 10-15 years.

icecube123

Yea ive been thinking about this for a few years. The Mx series’s chip would sell into data centers like crazy if apple went after that market. Especially if they created a server tuned chip. It could probably be their 2nd biggest product line behind the iphone. The performance and efficiency is awesome. I guess it would be meat to see some web serving and database benchmarks to really know.

kridsdale1

TSMC couldn’t make enough at the leading node in addition to all the iPhone chips Apple has to sell. There’s a physical thoughput limit. That’s why this isn’t M4.

notpushkin

Asahi is a thing. For headless usage it’s pretty much ready to go already.

EgoIncarnate

M3 support in Asahi is still heavily WIP. I think it doesn't even have display support, Ethernet, or Wifi yet, I think it's only serial over USB . Without any GPU or ANE support, it's not very useful for AI stuff. https://asahilinux.org/docs/M3-Series-Feature-Support/

criddell

The Asahi maintainer resigned recently. What that means for the future only time will tell. I probably wouldn't want to make a big investment in it right now.

lynndotpy

Not at all for M3 or M4. Support is for M2 and M1 currently.

WD-42

It’s only a thing for the M1. Asahi is a Sisyphean effort to keep up with new hardware and the outlook is pretty grim at the moment.

Apple’s whole m.o. is to take FOSS software, repackage it and sell it. They don’t want people using it directly.

Thaxll

Apple does not make server CPUs, they make consumer low W CPUs, it's very different.

FYI Apple runs Linux in their DC, so no Apple hardware in their own servers.

alwillis

> Apple does not make server CPUs, they make consumer low W CPUs, it's very different.

This is silly. Given the performance per watt, the M series would be great in a data center. As you all know, electricity for running the servers and cooling for the servers are the two biggest ongoing costs for a data center; the M series requires less power and runs more efficiently than the average Intel or AMD-based server.

> FYI Apple runs Linux in their DC, so no Apple hardware in their own servers.

That's certainly no longer the case. Apple announced their Private Cloud Compute [1] initiative—Apple designed servers running Apple Silicon to support Apple Intelligence functions that can't run on-device.

BTW, Apple just announced a $500 billion investment [2] in US-based manufacturing, including a 250,000 square foot facility to make servers. Yes, these will obviously be for their Private Cloud Compute servers… but it doesn't have to be only for that purpose.

From the press release:

As part of its new U.S. investments, Apple will work with manufacturing partners to begin production of servers in Houston later this year. A 250,000-square-foot server manufacturing facility, slated to open in 2026, will create thousands of jobs.

Previously manufactured outside the U.S., the servers that will soon be assembled in Houston play a key role in powering Apple Intelligence, and are the foundation of Private Cloud Compute, which combines powerful AI processing with the most advanced security architecture ever deployed at scale for AI cloud computing. The servers bring together years of R&D by Apple engineers, and deliver the industry-leading security and performance of Apple silicon to the data center.

Teams at Apple designed the servers to be incredibly energy efficient, reducing the energy demands of Apple data centers — which already run on 100 percent renewable energy. As Apple brings Apple Intelligence to customers across the U.S., it also plans to continue expanding data center capacity in North Carolina, Iowa, Oregon, Arizona, and Nevada.

[1]: https://security.apple.com/blog/private-cloud-compute/

[2]: https://www.apple.com/newsroom/2025/02/apple-will-spend-more...

stego-tech

> This hardware is really being held back by the operating system at this point.

It really is. Even if they themselves won't bring back their old XServe OS variant, I'd really appreciate it if they at least partnered with a Linux or BSD (good callout, ryao) dev to bring a server OS to the hardware stack. The consumer OS, while still better (to my subjective tastes) than Windows, is increasingly hampered by bloat and cruft that make it untenable for production server workloads, at least to my subjective standards.

A server OS that just treats the underlying hardware like a hypervisor would, making the various components attachable or shareable to VMs and Containers on top, would make these things incredibly valuable in smaller datacenters or Edge use cases. Having an on-prem NPU with that much RAM would be a godsend for local AI acceleration among a shared userbase on the LAN.

ryao

Given shared heritage, I would expect to see Apple work with FreeBSD before I would expect Apple to work with Linux.

hedora

I heard OpenBSD has been working for a while.

I’m continually surprised Apple doesn’t just donate something like 0.1% of their software development budget to proton and the asahi projects. It’d give them a big chunk of the gaming and server markets pretty much overnight.

I guess they’re too busy adding dark patterns that re-enable siri and apple intelligence instead.

stego-tech

You are technically correct (the best kind of correct). I’m just a filthy heathen who lumps the BSDs and Linux distros under “Linux” as an incredibly incorrect catchall for casual discourse.

hinkley

I miss the XServe almost as much as I miss the Airport Extreme.

stego-tech

I feel like Apple and Ubiquiti have a missed collaboration opportunity on the latter point, especially with the latter's recent UniFi Express unit. It feels like pairing Ubiquiti's kit with Apple's Homekit could benefit both, by making it easier for Homekit users to create new VLANs specifically for Homekit devices, thereby improving security - with Apple dubbing the term, say, "Secure Device Network" or some marketingspeak to make it easier for average consumers to understand. An AppleTV unit could even act as a limited CloudKey for UniFi devices like Access Points, or UniFi Cameras to connect/integrate as Homekit Cameras.

Don't get me wrong, I wouldn't use that feature (I prefer self-hosting it all myself), but for folks like my family members, it'd be a killer addition to the lineup that makes my life supporting them much easier.

klausa

>I had read somewhere that the interposer that enabled this for the M1 chips where not available.

With all my love and respect for "Apple rumors" writers; this was always "I read five blogposts about CPU design and now I'm an expert!" territory.

The speculation was based on the M3 Maxes die shots not having the interposer visible, which... implies basically nothing whether that _could have_ been supported in an M3 Ultra configuration; as evidenced by the announcement today.

sroussey

I’m guessing it’s not really a M3.

No M3 has thunderbolt 5.

This is a new chip with M3 marketing. I’d expect this from Intel, not Apple.

klausa

Baseline M4 doesn't have Thunderbolt 5 either; only the Pro/Max variants do.

The press-release even calls TB5 out: >Each Thunderbolt 5 port is supported by its own custom-designed controller directly on the chip.

Given that they're doing the same on A-series chips (A18 Pro with 10Gbps USB-C; A18 with USB 2.0); I imagine it's just relatively simple to swap the I/O blocks around and they're doing this for cost and/or product segmentation reasons.

hinkley

TB 5 seems like the sort of thing you could 'slap on' to a beefy enough chip.

Or the sort of thing you put onto a successor when you had your fingers crossed that the spec and hardware would finalize in time for your product launch but the fucking committee went into paralysis again at the last moment and now your product has to ship 4 months before you can put TB 5 hardware on shelves. So you put your TB4 circuitry on a chip that has the bandwidth to handle TB5 and you wait for the sequel.

GeekyBear

I also wondered about binning, so I pulled together how heavily Apple's Max chips were binned in shipping configurations.

M1 Max - 24 to 32 GPU cores

M2 Max - 30 to 38 GPU cores

M3 Max - 30 to 40 GPU cores

M4 Max - 32 to 40 GPU cores

I also looked up the announcement dates for the Max and the Ultra variant in each generation.

M1 Max - October 18, 2021

M1 Ultra - March 8, 2022

M2 Max - January 17, 2023

M2 Ultra - June 5, 2023

M3 Max - October 30, 2023

M3 Ultra - March 12, 2025

M4 Max - October 30, 2024

> My guess is that Apple developed this chip for their internal AI efforts

As good a guess as any, given the additional delay between the M3 Max and Ultra being made available to the public.

jonplackett

I’m missing the point. What is it you’re concluding from these dates?

GeekyBear

I was referring to the additional year of delay between the M3 Max and M3 Ultra announcements when compared to the M1 and M2 generations.

The theory that the M3 Ultra was being produced, but diverted for internal use makes as much sense as any theory I've seen.

It makes at least as much sense as the "TSMC had difficulty producing enough defect free M3 Max chips" theory.

kokada

> This hardware is really being held back by the operating system at this point.

Apple could either create a 2U rack hardware and support Linux (and I mean Apple supporting it, not hobbysts), or have a build of Darwin headless that could run on that hardware. But in the later case, we probably wouldn't have much software available (though I am sure people would eventually starting porting software to it, there is already MacPorts and Homebrew and I am sure they could be adapted to eventually run in that platform).

But Apple is also not interested in that market, so this will probably never happen.

ewzimm

There has to be someone at Apple with a contact at IBM that could make Fedora Apple Remix happen. It may not be on-brand, but this is a prime opportunity to make the competition look worse. File it under Community projects at https://opensource.apple.com/projects

naikrovek

> But Apple is also not interested in that market, so this will probably never happen.

they're just a tiny company with shareholders who are really tired of never earning back their investments. give 'em a break. I mean they're still so small that they must protect themselves by requiring that macs be used for publishing iPhone and iPad applications.

hnaccount_rng

Not to get in the way of good snark or anything. But.. Apple isn't _requiring_ that everyone uses MacOS on their systems. But you have to bring your own engineering effort to actually make another OS run. And so far Asahi is the only effort that I'm aware of (there were alternatives in the very beginning, but they didn't even get to M2 right?)

AlchemistCamp

Keep in mind the minimum configuration that has 512GB of unified RAM is $9,499.

stego-tech

I cannot express how dirt cheap that pricepoint is for what's on offer, especially when you're comparing it to rackmount servers. By the time you've shoehorned in an nVidia GPU and all that RAM, you're easily looking at 5x that MSRP; sure, you get proper redundancy and extendable storage for that added cost, but now you also need redundant UPSes and have local storage to manage instead of centralized SANs or NASes.

For SMBs or Edge deployments where redundancy isn't as critical or budgets aren't as large, this is an incredibly compelling offering...if Apple actually had a competent server OS to layer on top of that hardware, which it does not.

If they did, though...whew, I'd be quaking in my boots if I were the usual Enterprise hardware vendors. That's a damn frightening piece of competition.

kllrnohj

> By the time you've shoehorned in an nVidia GPU and all that RAM, you're easily looking at 5x that MSRP

That nvidia GPU setup will actually have the compute grunt to make use of the RAM, though, which this M3 Ultra probably realistically doesn't. After all, if the only thing that mattered was RAM then the 2TB you can shove into an Epyc or Xeon would already be dominating the AI industry. But they aren't, because it isn't. It certainly hits at a unique combination of things, but whether or not that's maximally useful for the money is a completely different story.

AlchemistCamp

It's not quite an apples to apples comparison, no pun intended. I guess we'll see how it sells.

cubefox

I assume there is a very good reason why AMD and Intel aren't releasing a similar product.

42lux

Still cheap if the only thing you look for is vram.

baq

This is a ‘shut up and take my money’ price, it’ll fly off the shelves.

nsteel

And how is it only £9,699.00!! Does that dollar price include sales tax or are Brits finally getting a bargain?

kgwgk

What's the bargain?

There is also "parity" in other products like a MacBook Pro from £1,599 / $1,599 or an iPhone 16 from £799 / $799. £9,699 / $9,499 is worse than that!

vr46

The US prices never include state sales tax IIRC. Maybe we're finally getting some parity.

mastax

Tariffs perhaps?

DrBenCarson

Cheap relative to the alternatives

jread

$8549 with 1TB storage

rbanffy

It can connect to external storage easily.

hedora

Other than the NPU, it’s not really a game changer; here’s a 512GB AMD deepseek build for $2000:

https://digitalspaceport.com/how-to-run-deepseek-r1-671b-ful...

aurareturn

  between 4.25 to 3.5 TPS (tokens per second) on the Q4 671b full model.
3.5 - 4.25 tokens/s. You're torturing yourself. Especially with a reasoning model.

This will run it at 40 tokens/s based on rough calculation. Q4 quant. 37b active parameters.

5x higher price for 10x higher performance.

hinkley

Also you don't have to deal with Windows. Which people who do not understand Apple are very skilled at not noticing.

If you've ever used git, svn, or an IDE side by side on corporate Windows versus Apple I don't know why you would ever go back.

flakiness

The low energy use can be a game changer if you live in a crappy apartment with limited power capacity. I gave up my big GPU box dream because of that.

hajile

One of the leakers who got this Mac Studio right claims Apple is reserving the M4 ultra for the Mac Pro to differentiate the products a bit more.

ksec

Previous model of M2 Ultra had max memory of 192GB. Or 128GB for Pro and some other M3 model, which I think is plenty for even 99.9% of professional task.

They now bump it to 512GB. Along with insane price tag of $9499 for 512GB Mac Studio. I am pretty sure this is some AI Gold rush.

InTheArena

Every single AI shop on the planet is trying to figure out if there is enough compute or not to make this a reasonable AI path. If the answer is yes, that 10k is a absolute bargain.

ZeroTalent

No, because there is no CUDA. We have fast and cheap alternatives to NVIDIA, but they do not have CUDA. This is why NVIDIA has 90% margins on its hardware.

jauntywundrkind

CUDA is simply not important for modern vLLM and many many others. DeepSeek V3 works great on SGLang. https://www.amd.com/en/developer/resources/technical-article...

Can you do absolutely everything? No. But most models will run or retrain fine now without CUDA. This premise keeps getting recycled from the past, even as that past has grown ever more distant.

Spooky23

> that 10k is a absolute bargain

The higher end NVidia workstation boxes won’t run well on normal 20amp plugs. So you need to move them to a computer room (whoops, ripped those out already) or spend months getting dedicated circuits run to office spaces.

magnetometer

Didn't really think about this before, but that seems to be mainly an issue in Northern / Central America and Japan. In Germany, for example, typical household plugs are 16A at 230V.

someothherguyy

In the US, normal circuits aren't always 20A, especially in residential buildings, where they are more commonly 15A in bedrooms and offices.

https://en.wikipedia.org/wiki/NEMA_connector

827a

Is this actually true? Were people doing this with the 192gb of the M2 Ultra?

I'm curious to learn how AI shops are actually doing model development if anyone has experience there. What I imagined was: Its all in the "cloud" (or, their own infra), and the local machine doesn't matter. If it did matter, the nvidia software stack is too important, especially given that a 512gb M3 Ultra config costs $10,000+.

DrBenCarson

You’re largely correct for training models

Where this hardware shines is inference (aka developing products on top of the models themselves)

internetter

No AI shop is buying macs to use as a server. Apple should really release some server macOS distribution, maybe even rackable M-series chips. I believe they have one internally.

jerjerjer

Why would any business pay Apple Tax for a backend, server product?

NorwegianDude

Not much to figure out. It's 2x M4 Max, so you need 100 of these to match the TOPS of even a single consumer card like the RTX 5090.

jeffhuys

Sure, but if you have models like DeepSeek - 400GB - that won't fit on a consumer card.

wpm

It's 2x M3 Max

alberth

> It's 2x M4 Max

Not exactly though.

This can have 512GB unified memory, 2x M4 Max can only have 128GB total (64GB each).

DrBenCarson

Now do VRAM

HPsquared

LLMs easily use a lot of RAM, and these systems are MUCH, MUCH cheaper (though slower) than a GPU setup with the equivalent RAM.

A 4-bit quantization of Llama-3.1 405b, for example, should fit nicely.

segmondy

The question will be how it will perform. I suspect Deepseek, Llama405B demonstrated the need for larger memory. Right now folks could build an epyc system with that much ram or more to run Deepseek at about 6 tokens/sec for a fraction of that cost. However not everyone is a tinker, so there's a market for this for those that don't want to be bothered. You say "AI Gold rush" like it's a bad thing, it's not.

bloppe

Big question is: Does the $10k price already reflect Trump's tariffs on China? Or will the price rise further still..

dwighttk

Maybe .1% of tasks need this RAM, why are they charging so much?

cjbgkagh

I don't need 512GB of RAM but the moment I do I'm certain I'll have bigger things to worry about than a $10K price tag.

almostgotcaught

This is Pascal's wager written in terms of ... RAM. The original didn't make sense and neither does this iteration.

pier25

Because the minority that needs that much RAM can't work without it.

In the media composing world they use huge orchestral templates with hundreds and hundreds of tracks with millions of samples loaded into memory.

agloe_dreams

Because the .1% is who will buy it? I mean, yeah, supply and demand. High demand in a niche with no supply currently means large margins.

I don't think anyone commercially offers nearly this much unified memory or NPU/GPUs with anything near 512GB of memory.

madeofpalk

Maybe because .1% of tasks need this RAM, it attracts a .1% price tag

Sharlin

It enables the use of giant AI models on a personal computer. Might not run too fast though. But at least it's possible at all.

regularfry

The narrower the niche, the more you can charge.

Spooky23

With all things semiconductor, low volume = higher cost (and margin).

The people who need the crazy resource can tie it to some need that costs more. You’d spend like $10k running a machine with similar capabilities in AWS in a month.

A4ET8a8uTh0_v2

I think the answer is because they can ( there is a market for it ). The benefit to a crazy person like me that with this addition, I might be able to grab 128gb version at a lower price.

lauritz

They update the Studio to M3 Ultra now, so M4 Ultra can presumably go directly into the Mac Pro at WWDC? Interesting timing. Maybe they'll change the form factor of the Mac Pro, too?

Additionally, I would assume this is a very low-volume product, so it being on N3B isn't a dealbreaker. At the same time, these chips must be very expensive to make, so tying them with luxury-priced RAM makes some kind of sense.

lauritz

Interestingly, Apple apparently confirmed to a French website that M4 lacks the interconnect required to make an "Ultra" [0][1], so contrary to what I originally thought, they maybe won't make this after all? I'll take this report with a grain of salt, but apparently it's coming directly from Apple.

Makes it even more puzzling what they are doing with the M2 Mac Pro.

[0] https://www.numerama.com/tech/1919213-m4-max-et-m3-ultra-let...

[1] More context on Macrumors: https://www.macrumors.com/2025/03/05/apple-confirms-m4-max-l...

agloe_dreams

My understanding was that Apple wanted to figure out how to build systems with multi-SOCs to replace the Ultra chips. The way it is currently done means that the Max chips need to be designed around the interconnect. Theoretically speaking, a multi-SOC setup could also scale beyond two chips and into a wider set of products.

rbanffy

Ultra is already two big M3 chips coupled through an interposer. Apple is curiously not going the way of chiplets like the big CPU crowd is.

aurareturn

I'm not sure if multi-SoC is possible because having 2 GPUs together such that the OS sees it as one big GPU is not very possible if the SoCs are separated.

layer8

Apple says that not every generation will get an “Ultra” variant: https://arstechnica.com/apple/2025/03/apple-announces-m3-ult...

raydev

Honestly I don't think we'll see the M4 Ultra at all this year. That they introduced the Studio with an M3 Ultra tells me M4 Ultras are too costly or they don't have capacity to build them.

And anyway, I think the M2 Mac Pro was Apple asking customers "hey, can you do anything interesting with these PCIe slots? because we can't think of anything outside of connectivity expansion really"

RIP Mac Pro unless they redesign Apple Silicon to allow for upgradeable GPUs.

null

[deleted]

jsheard

> Maybe they'll change the form factor of the Mac Pro, too?

Either that or kill the Mac Pro altogether, the current iteration is such a half-assed design and blatantly terrible value compared to the Studio that it feels like an end-of-the-road product just meant to tide PCIe users over until they can migrate everything to Thunderbolt.

They recycled a design meant to accommodate multiple beefy GPUs even though GPUs are no longer supported, so most of the cooling and power delivery is vestigial. Plus the PCIe expansion was quietly downgraded, Apple Silicon doesn't have a ton of PCIe lanes so the slots are heavily oversubscribed with PCIe switches.

lauritz

I agree. Nonetheless, I agree with Siracusa that the Mac Pro makes sense as a "halo car" in the Mac lineup.

I just find it interesting that you can currently buy a M2 Ultra Mac Pro that is weaker than the Mac Studio (for a comparable config) at a higher price. I guess it "remains a product in their lineup" and we'll hear more about it later.

Additionally: If they wanted to scrap it down the road, why would they do this now?

madeofpalk

The current Mac Pro is not a "halo car". It's a large USB-A dongle for a Mac Studio.

crowcroft

Agree with this, and it doesn't seem like it's a priority for Apple to bring the kind of expandability back any time soon.

Maybe they can bring back the trash can.

jsheard

Isn't the Mac Studio the new trash can? I can't think of how a non-expandable Mac Pro could be meaningfully different to the Studio unless they introduce an even bigger chip above the Ultra.

pier25

I've always maintained that the M2 Mac Pro was really a dev kit for manufacturers of PCI parts. It's such a meaningless product otherwise.

newsclues

The Mac Pro could exist as a PCIe expansion slot storage case that accepts a logic board from the frequently updated consumer models. Or multiple Mac Studio logic boards all in one case with your expansion cards all working together.

TheTxT

512GB unified memory is absolutely wild for AI stuff! Compared to how many NVIDIA GPUs you would need, the pricing looks almost reasonable.

InTheArena

A server with 512GB of high-bandwidth GPU addressable RAM in a server is probably a six figure expenditure. If memory is your constrain, this is absolutely the server for you.

(sorry, should have specified that the NPU and GPU cores need to access that ram and have reasonable performance). I specified it above, but people didn't read that :-)

Numerlor

A basic brand new server can easily do 512gb. Not as fast as soldered memory but it should be maybe mid to high 5 figures

null

[deleted]

jeffbee

That doesn't sound right. The marginal cost of +768GB of DDR5 ECC memory in an EPYC system is < $5k.

InTheArena

GPU accessible RAM.

behnamoh

except that you cannot run multiple language models on Apple Silicon in parallel

kevin42

I'm curious why not. I am running a few different models on my mac studio. I'm using llama.cpp, and it performs amazingly fast for the $7k I spent.

jeroenhd

If you're going to overthrow your entire AI workflow to use a different API anyway, surely the AMD Instinct accelerator cards make more sense. They're expensive, but also a lot faster, and you don't need to deal with making your code work on macOS.

wmf

Doesn't AMD Instinct cost >$50K for 512GB?

codedokode

I don't think API has any value because writing software is free and hardware for ML is super expensive.

internetter

> writing software is free

says who? NVIDIA has essentially entrenched themselves thanks to CUDA

knowitnone

I'd like to hire you to write free software

null

[deleted]

chakintosh

14k for a maxed out Mac Studio

mrtksn

Let's say you want to have the absolute max memory(512GB) to run AI models and let's say that you are O.K. with plugging a drive to archive your model weights then you can get this for a little bit shy of $10K. What a dream machine.

Compared to Nvidia's Project DIGITS which is supposed to cost $3K and be available "soon", you can get a specs matching 128GB & 4TB version of this Mac for about $4700 and the difference would be that you can actually get it in a week and will run macOS(no idea how much performance difference to expect).

I can't wait to see someone testing the full DeepSeek model on this, maybe this would be the first little companion AI device that you can fully own and can do whatever you like with it, hassle-free.

bloomingkales

There’s an argument that replaceable pc parts is what you want at that price point, but Apple usually provides multi year durability on their pcs. An Apple ai brick should last awhile.

NightlyDev

The full deepseek R1 model needs more memory than 512GB. The model is 720GB alone. You can run a quantized version on it, but not the full model.

summarity

You can chain multiple Mac Studios using exo for inference, you'd "only" need two of these. There's a bottleneck in the RMA speed over TB5, but this may not matter as much for a MoE model.

behnamoh

> I can't wait to see someone testing the full DeepSeek model on this

at 819 GB per second bandwidth, the experience would be terrible

coder543

DeepSeek-R1 only has 37B active parameters.

A back of the napkin calculation: 819GB/s / 37GB/tok = 22 tokens/sec.

Realistically, you’ll have to run quantized to fit inside of the 512GB limit, so it could be more like 22GB of data transfer per token, which would yield 37 tokens per second as the theoretical limit.

It is likely going to be very usable. As other people have pointed out, the Mac Studio is also not the only option at this price point… but it is neat that it is an option.

mrtksn

How many t/s would you expect? I think I feel perfectly fine when its over 50.

Also, people figured a way to run these things in parallel easily. The device is pretty small, I think for someone who wouldn't mind the price tag stacking 2-3 of those wouldn't be that bad.

yk

I think I've seen 800 GB/s memory bandwidth, so a q4 quant of a 400 B model should be 4 t/s if memory bound.

behnamoh

I know you’re referring to the exolabs app, but the t/s is really not that good. it uses thunderbolt instead of NVlink.

bearjaws

Not sure why you are being downvoted, we already know the performance numbers due to memory bandwidth constraints on the M4 Max chips, it would apply here as well.

525GB/s to 1000GB/s will double the TPS at best, which is still quite low for large LLMs.

lanceflt

Deepseek R1 (full, Q1) is 14t/s on an M2 Ultra, so this should be around 20t/s

teleforce

Thunderbolt 5 (TB 5) is pretty handy, you can have a very thin and lightweight laptop, then can get access to external GPU or eGPU via TB 5 if needed [1]. Now you can have your cake (lightweight laptop) and eat it too (potent GPU).

[1] Asus just announced the world’s first Thunderbolt 5 eGPU:

https://www.theverge.com/24336135/asus-thunderbolt-5-externa...

ben-schaaf

Except that you're stuck with macOS, so there aren't any drivers for NVIDIA, AMD or Intel GPUs.

iamtheworstdev

and that no one is developing games for MacOS.

wpm

Apple Silicon does not work with eGPU.

DidYaWipe

How do people feel about the value of the M3 Ultra vs. the M4 Max for general computing, assuming that you max out the RAM on the M4 version of the Studio?

wewewedxfgdf

Computers these days - the more appealing, exciting, cooler desirable, the higher the price, into the stratosphere.

$9499

What ever happening to competition in computing?

Computing hardware competition used to be cut throat, drop dead, knife fight, last man standing brutally competitive. Now it's just a massive gold rush cash grab.

hu3

It doesn't even run Linux properly.

Could cost half of that and it would still be uninteresting for my use cases.

For AI, on-demand cloud processing is magnitudes better in speed and software compatibility anyway.

WXLCKNO

You take the top price of the top of the line newest pro chip apple produces and then make this argument?

niek_pas

The Macintosh plus, released in 1986, cost $2600 at the time, or $7460 adjusted for inflation.

bigyabai

It even came with an official display! Nowadays that's a $1,600-$6,000 accessory, depending on whether you own a VESA mount.

raydev

I know it's basically nitpicking competing luxury sports cars at this point, but I am very bothered that existing benchmarks for the M3 show single core perf that is approximately 70% of M4 single core perf.

I feel like I should be able to spend all my money to both get the fastest single core performance AND all the cores and available memory, but Apple has decided that we need to downgrade to "go wide". Annoying.

xp84

> both get the fastest single core performance AND all the cores

I'm a major Apple skeptic myself, but hasn't there always been a tradeoff between "fastest single core" vs "lots of cores" (and thus best multicore)?

For instance, I remember when you could buy an iMac with an i9 or whatever, with a higher clock speed and faster single core, or you could buy an iMac Pro with a Xeon with more cores, but the iMac (non-Pro) would beat it in a single core benchmark. Note: Though I used Macs as the example due to the simple product lines, I thought this was pretty much universal among all modern computers.

raydev

> hasn't there always been a tradeoff between "fastest single core" vs "lots of cores" (and thus best multicore)?

Not in the Apple Silicon line. The M2 Ultra has the same single core performance as the M2 Max and Pro. No benchmarks for the M3 Ultra yet but I'm guessing the same vs M3 Max and Pro.

xp84

Okay, good to know. Interesting change then.

c0deR3D

When would Apple silicons made natively support for OSes such as Linux? Apple seemlingly reluctant to release detailed technical reference manual for M-series SoCs, which makes running Linux natively on Apple silicon challenging.

bigyabai

Probably never. We don't have official Linux support for the iPhone or iPad, I would't hold out hope for Apple to change their tune.

dylan604

That makes sense to me though. If you don’t run iOS, you don’t have App Store and that means a loss of revenue.

bigyabai

Right. Same goes for MacOS and all of it's convenient software services. Apple might stand to sell more units with a more friendlier stance towards Linux, but unless it sells more Apple One subscriptions or increases hardware margins on the Mac, I doubt Cook would consider it.

If you sit around expecting selflessness from Apple you will waste an enormous amount of time, trust me.

AndroTux

If you don't run macOS, you don't have Apple iCloud Drive, Music, Fitness, Arcade, TV+ and News and that means a loss of revenue.

jobs_throwaway

You lose out on revenue from people who require OS freedom though

dylan604

That’s what’s weird to me too. It’s not like they would lose sales of macOS as it is given away with the hardware. So if someone wants to buy Apple hardware to run Linux, it does not have a negative affect to AAPL

bigfishrunning

Except the linux users won't be buying Apple software, from the app store or elsewhere. They won't subscribe to iCloud.

dylan604

I have Mac hardware and and have spent $0 through the Mac App Store. I do not use iCloud on it either. I do on iDevices though. I must be an edge case though.

cosmic_cheese

Those buying the hardware to run Linux also aren’t writing software for macOS to help make the platform more attractive.

jeroenhd

While I don't think Apple wants to change course from its services-oriented profit model, surely someone within Apple has run the calculations for a server-oriented M3/M4 device. They're not far behind server CPUs in terms of performance while running a lot cooler AND having accelerated amd64 support, which Ampere lacks.

Whatever the profit margin on an iMac Studio is these days, surely improving non-consumer options becomes profitable at some point if you start selling them by the thousands to data centers.

amelius

But then they'd have to open up their internal documentation of their silicon, which could possibly be a legal disaster (patents).

re-thc

> So if someone wants to buy Apple hardware to run Linux, it does not have a negative affect to AAPL

It does. Support costs. How do you prove it's a hardware failure or software? What should they do? Say it "unofficially" supports Linux? People would still try to get support. Eventually they'd have to test it themselves etc.

dylan604

Apple has already been in this spot. With the TrashCan MacPro, there was an issue with DaVinci Resolve under OS X at the time where the GPU was cause render issues. If you then rebooted into Windows with BootCamp using the exact same hardware and open up the exact same Resolve project with the exact same footage, the render errors disappeared. Apple blamed Resolve. DaVinci blamed GPU drivers. GPU blamed Apple.

k8sToGo

We used to have bootcamp though.

WillAdams

Is it not an option to run Darwin? What would Linux offer that that would not?

internetter

Darwin is a terrible server operating system. Even getting a process to run at server boot reliably is a nightmare.

kbolino

I don't think Darwin has been directly distributed in bootable binary format for many years now. And, as far as I know, it has never been made available in that format for Apple silicon.

apatheticonion

God I wish Linux ran on Apple Silicon (with first class hardware support).