I regret building this $3000 Pi AI cluster
175 comments
·September 19, 2025Aurornis
bunderbunder
The cost effective way to do it is in the cloud. Because there's a very good chance you'll learn everything you intended to learn and then get bored with it long before your cloud compute bill reaches the price of a desktop with even fairly modest specs for this purpose.
dukeyukey
It's good for the soul to have your cluster running in your home somewhere.
NordSteve
Bad for your power bill though.
ofrzeta
Maybe so, but even then a second-hand blade server is more cost-effective than a Raspi Cluster.
Almondsetat
I can get a Xeon E5-2690V4 with 28 threads and 64GB of RAM for about $150. If you need cores and memory to make a lot of VMs you can do it extremely cheaply
Aurornis
> I can get a Xeon E5-2690V4 with 28 threads and 64GB of RAM for about $150.
If the goal is a lot of RAM and you don’t care about noise, power, or heat then these can be an okay deal.
Don’t underestimate how far CPUs have come, though. That machine will be slower than AMD’s slowest entry-level CPU. Even an AMD 5800X will double its single core performance and even walk away from it on multithreaded tasks despite only having 8 cores. It will use less electricity and be quiet, too. More expensive, but if this is something you plan to leave running 24/7 the electricity costs over a few years might make the power hungry server more expensive over time.
semi-extrinsic
For $3000 you can get 3x used Epyc servers with a total of 144 cores and 384 GB memory, with dual-port 25Gbe networking so you can run them in a fully connected cluster without a switch. It will have >20x better perf/$ and ~3x better perf/W.
That combo gives you the better part of a gigabyte of L3 cache and an aggregate memory bandwidth of 600 GB/s, while still below 1000W total running at full speed. Plus your NICs are the fancy kind that let you play around with RoCEv2 and such nifty stuff.
It would also be relevant to then also learn how to do stuff properly with SLURM and Warewulf etc. instead of a poor mans solution with Ansible playbooks like in these blog posts.
mattbillenstein
Power and noise - old server hardware is not something you want in your home.
Commodity desktop cpus with 32 or 64GB RAM can do all of this in a low-power and quiet way without a lot more expense.
nine_k
It will probably consume $150 worth of electricity in less than a month, even sitting idle :-\
sebastiansm
On Aliexpress those Xeon+mobo+ram kits are really cheap.
kbenson
Source? That seems like something I would want to take advantage if at the moment...
null
montebicyclelo
Yeah... Looks like can get about $1/hr for 10 small VMs, ($0.10 per VM).
So for $3000, that's 3000 hours, or 125 days, (if just wastefully leave them on all the time, instead of turning them on when needed).
Say you wanted to play around for a couple of hours, that's like.. $3.
(That's assuming there's no bonus for joining / free tier, too.)
wongarsu
The VMs quickly get expensive if you leave them running though.
The desktop equivalent of your 10 T3 Micro instances is about $600 if you buy new. For example a Lenovo ThinkCentre M75q Gen 2 Tiny 11JN009QGE has 8x3.2GHz processor with hyperthreading. That's 16 virtual cores compared to the 20 vcpus of the T3 instances, but with much faster cores. And 16GB RAM allows you to match the 1GB per instance.
If you don't have anything and feel generous throw in another $200 for a good monitor and keyboard plus mouse. But you can get a used crap monitor for $20. I'd give you one for free just to be rid of it.
That's a total of $800, or 33 days of forgetting to shut down the 10 VMs. Maybe half that if you buy used.
Granted not everyone has $800 or even $400 to drop on hobby projects, renting VMs often does make sense
verdverm
You can rent a beefy vm with an H100 for $1.50 / hr
I regularly rent this for a few hours at a time for learning and prototyping
aprdm
That really depends on what you want to learn and how deep. If you're automating things before the hypervisor comes online or there's an OS running (e.g: working on datacenter automation, bare metal as a service) you will have many gaps
leoc
If you want to run something like GNS3 network simulation on a hosting service's hardware you'll either have to deal with hiring a bare-metal server or deal with nested virtualisation on other people's VM setups. Network simulation absolutely drinks RAM, too, so just filling an old Xeon with RAM starts to look very attractive in comparison to cloud providers who treat it an expensive upsell.
nsxwolf
That isn’t fun. I have a TI-99/4A in my office hooked up to a raspberry pi so it can use the internet. Why? Because it’s fun. I like to touch and see the things even though it’s all so silly.
bakugo
It heavily depends on the use case. For these AI setups, you're completely correct, because the people who talk about how amazing it is to run a <100B model at home almost never actually end up using it for anything real (mostly because these small models aren't actually very good) and are doing it purely for the novelty.
But if you're someone like me who intends to actively use the hardware for real-world purposes, the cloud often simply can't compete on price. At home, I have a mini PC with a 5600G, 32GB of RAM, and a few TBs of NVME storage. The entire thing cost less than $600 a few years ago, and consumes around 20W of power on average.
Even on the cheapest cloud providers available, an equivalent setup would exceed that price in less than half a year. SSD storage in particular is disproportionately expensive on the cloud. For small VMs that don't need much storage, it does make sense, but as soon as you scale up, cloud prices quickly start ballooning.
mattbillenstein
LOL, no
newsclues
Text and reference books are free at the library.
You don’t need hardware to learn. Sure it helps but you can learn from a book and pen and paper exercises.
trenchpilgrim
I disagree. Most of what I've learned about systems comes from debugging the weird issues that only happen on real systems, especially real hardware. The book knowledge is like, 20-30% of it.
glitchc
I did some calculations on this. Procuring a Mac Studio with the latest Mx Ultra processor and maxing out the memory seems to be the most cost effective way to break into 100b+ parameter model space.
teleforce
Not quite, as it stands now the most cost effective way is most likely framework desktop or similar system for example HP G1a laptop/PC [1],[2].
[1] The Framework Desktop is a beast:
https://news.ycombinator.com/item?id=44841262
[2] HP ZBook Ultra:
GeekyBear
Now that we know that Apple has added tensor units to the GPU cores the M5 series of chips will be using, I might be asking myself if I couldn't wait a bit.
randomgermanguy
Depends on how heavy one wants to go with the quants (for Q6-Q4 the AMD Ryzen AI MAX chips seem better/cheaper way to get started).
Also the Mac Studio is a bit hampered by its low compute-power, meaning you really can't use a 100b+ dense model, only MoE feasibly without getting multi minute prompt-processing times (assuming 500+ tokens etc.)
GeekyBear
Given the RAM limitations of the first gen Ryzen AI MAX, you have no choice but to go heavy on the quantization of the larger LLMs on that hardware.
mercutio2
Huh? My maxed out Mac Studio gets 60-100 tokens per second on 120B models, with latency on the order of 2 seconds.
It was expensive, but slow it is not for small queries.
Now, if I want to bump the context window to something huge, it does take 10-20 seconds to respond for agent tasks, but it’s only 2-3x slower than paid cloud models, in my experience.
Still a little annoying, and the models aren’t as good, but the gap isn’t nearly as big as you imply, at least for me.
the8472
You could try getting a DGX Thor devkit with 128GB unified memory. Cheaper than the 96GB mac studio and more FLOPs.
eesmith
Geerling links to last month's essay on a Frameboard cluster, at https://www.jeffgeerling.com/blog/2025/i-clustered-four-fram... . In it he writes 'An M3 Ultra Mac Studio with 512 gigs of RAM will set you back just under $10,000, and it's way faster, at 16 tokens per second.' for 671B parameters, that is, that M3 is at least 3x the performance of the other three systems.
Palomides
even a single new mac mini will beat this cluster on any metric, including cost
llm_nerd
The next generation M5 should bring the matmul functionality seen on the A19 Pro to the desktop SoC's GPU -- "tensor" cores, in essence -- and will dramatically improve the running of most AI models on those machine.
Right now the Macs are viable purely because you can get massive amounts of unified memory. Be pretty great when they have the massive matrix FMA performance to complement it.
vlovich123
I’d say it’s inconclusive. For traditional compute it wins on power and cost (it’ll always lose on space). The inference is noted to not be able to use the GPU due to llama.cpp’s vulkan backend AND that clustering software in llama.cpp is bad. I’d say it’s probably still going to be worse for AI but it’s inconclusive where it could be due to the software immaturity (ie not worth it today but could be with better software)
llm_nerd
If you assume that the author did this to have content for his blog and his YouTube channel, it makes much more sense. Going back to the well with a "I regret" entry allows for extra exploiting of a pretty dubious venture.
YouTube is absolute jam packed full of people pitching home "lab" sort of AI buildouts that are just catastrophically ill-advised, but it yields content that seems to be a big draw. For instance Alex Ziskind's content. I worry that people are actually dumping thousands to have poor performing ultra-quantized local AIs that will have zero comparative value.
philipwhiuk
I doubt anyone does this seriously.
nerdsniper
I sure hope no one does this seriously expecting to save some money. I enjoy the videos on "catastrophically ill-advised" build-outs. My primary curiosities that get satisfied by them are:
1) How much worse / more expensive are they than a conventional solution?
2) What kinds of weird esoteric issues pop up and how do they get solved (e.g. the resizable BAR issue for GPU's attached to RPi's PCIe slot)
kolbe
The author, Jeff Geerling, is a very intelligent person. He has more experience with using niche hardware than almost anyone on earth. If he does something, there's usually a good a priori rationale for it.
buildbot
Jeff is a good person/blogger and does interesting projects but more experience with niche hardware than literally anyone is a stretch.
Like what about the people who maintain the alpha/sparc/parisc linux kernels? Or the designers behind idk tilera or tenstorrent hardware.
phatfish
Youtubers have armies of sycophants (check their video comments if you dare). Not saying they even court them, something to do with video building a stronger parasocial relationship than a text blog I think.
wccrawford
Geerling's titles have been increasingly click-bait for a while now. It's pretty sad, because I like his content, but hate the click-bait BS.
moduspol
Also cost effective is to buy used rack mount servers from Amazon. They may be out of warranty but you get a lot more horsepower for your buck, and now your VMs don’t have to be small.
Aurornis
Putting a retired datacenter rack mount server in your house is a great way to learn how unbearably loud a real rack mount datacenter server is.
Tsiklon
To quote @swiftonsecurity - https://x.com/swiftonsecurity/status/1650223598903382016 ;
> DO NOT TAKE HOME THE FREE 1U SERVER YOU DO NOT WANT THAT ANYWHERE A CLOSET DOOR WILL NOT STOP ITS BANSHEE WAIL TO THE DARK LORD AN UNHOLY CONDUIT TO THE DEPTHS OF INSOMNIA BINDING DARKNESS TO EVEN THE DAY
tempest_
ahah and pricey power wise.
Currently the cloud providers are dumping second gen xeon scalables and those things are pigs when it comes to power use.
Sound wise its like someone running a hair dryer at full speed all the time and it can be louder under load.
Y_Y
If you're following this path, make sure to use the finest traditional server rack that money can buy: https://www.ikea.com/ie/en/p/lack-side-table-white-30449908/
allanrbo
No, again, just run VMs on your desktop/laptop. The software doesn't know or care if it's a rack mounted machine.
TZubiri
Fun fact, a raspberry pi does not have a built in Real Time Clock with its own battery, so it relies on network clocks to keep the time.
Another fun fact, the network module of the pi is actually connected to the USB bus, so there's some overhead as well as a throughput limitation.
Fun fact, the Pi does not have a power button, relying on software to shut down cleanly. If you lose access to the machine, it's not possible to avoid corrupted states on the disk.
Despite all of this, if you want to self host some website, the raspberry pi is still an amazingly cost effective choice, from anywhere between 2 to 20000 monthly users, one pi will be overprovisioned. And you can even get an absolutely overkill redundant pi as a failover, but still a single pi can reach 365 days of uptime with no problem, and as long as you don't reboot or lose power or lose internet, you can achieve more than a couple of nines of reliability.
But if you are thinking of a third, much less a 10th raspberry pi, you are probably scaling the wrong way, way before you reach the point where a quantity matters ( a third machine), it becomes cost effective to upgrade the quality of your one or two machines.
On the embedded side it's the same story, these are great for prototyping, but you are not going to order 10k and sell them in production, maybe a small 100 test batch? But you will optimize and make your own PCB before a mass batch.
alias_neo
> the raspberry pi is still an amazingly cost effective choice
It's really not though. I've been a Pi user and fan since it was first announced, and I have dozens of them, so I'm not hating on RPi here; we did the maths some time back here on HN when something else Pi related came up.
If you go for a Pi5 with say 8GB RAM, by the time you factor in an SSD + HAT + PSU + Case + Cooler (+ maybe a uSD), you're actually already in mini-PC price territory and you can get something much more capable and feature complete for about the same price, or for a few £ more, something significantly more capable, better CPU, iGPU, you'll get an RTC, proper networking, faster storage, more RAM, better cooling, etc, etc, and you won't be using much more electricity either.
I went this route myself and have figuratively and literally shelved a bunch of Pis by replacing them with a MiniPC.
My conclusion, for my own use, after a decade of RPi use, is that a cheap mini PC is the better option these days for hosting/services/server duty and Pis are better for making/tinkering/GPIO related stuff, even size isn't a winner for the Pi any more with the size of some of the mini-PCs on the market.
stuxnet79
> Fun fact, a raspberry pi does not have a built in Real Time Clock with its own battery, so it relies on network clocks to keep the time.
> Another fun fact, the network module of the pi is actually connected to the USB bus, so there's some overhead as well as a throughput limitation.
> Fun fact, the Pi does not have a power button, relying on software to shut down cleanly. If you lose access to the machine, it's not possible to avoid corrupted states on the disk.
With all these caveats in mind, a raspberry pi seems to be an incredibly poor choice for distributed computing
bee_rider
> The first benchmark I ran was my top500 High Performance Linpack cluster benchmark. This is my favorite cluster benchmark, because it's the traditional benchmark they'd run on massive supercomputers to get on the top500 supercomputer list. […]
> After fixing the thermals, the cluster did not throttle, and used around 130W. At full power, I got 325 Gflops
I was sort of surprised to find that the top500 list on their website only goes back to 1993. I was hoping to find some ancient 70’s version of the list where his ridiculous Pi cluster could sneak on. Oh well, might as well take a look… I’ll pull from the sub-lists of
https://www.top500.org/lists/top500/
They give the top 10 immediately.
First list (June 1993):
placement name RPEAK (GFlop/s)
1 CM-5/1024 131.00
10 Y-MP C916/16256 15.24
Last list he wins, I think (June 1996): 1 SR2201/1024 307.20
10 SX-4/32 64.00
First list he’s bumped out of the top 10 (November 1997): 1 ASCI Red 1,830.40
10 T3E 326.40
I think he gets bumped off the full top500 list around 2002-2003. Unfortunately I made the mistake of going by Rpeak here, but they sort by Rmax, and I don’t want to go through the whole list.Apologies for any transcription errors.
Actually, pretty good showing for such a silly cluster. I think I’ve been primed by stuff like “your watch has more compute power than the Apollo guidance computer” or whatever to expect this sort of thing to go way, way back, instead of just to the 90’s.
bunderbunder
Reminds me a bit of one of my favorite NormConf sessions, "Just use one big machine for model training and inference." https://youtu.be/9BXMWDXiugg?si=4MnGtOSwx45KQqoP
Or the oldie-but-goodie paper "Scalability! But at what COST?": https://www.usenix.org/system/files/conference/hotos15/hotos...
Long story short, performance considerations with parallelism go way beyond Amdahl's Law, because supporting scale-out also introduces a bunch of additional work that simply doesn't exist in a single node implementation. (And, for that matter, multithreading also introduces work that doesn't exist for a sequential implementation.) And the real deep down black art secret to computing performance is that the fastest operations are the ones you don't perform.
Coffeewine
It's a pretty rough headline, clearly the author had fun performing the test and constructing the thing.
I would be pretty regretful of just the first sentence in the article, though:
> I ordered a set of 10 Compute Blades in April 2023 (two years ago), and they just arrived a few weeks ago.
That's rough.
geerlingguy
That's the biggest regret; but I've backed 6 Kickstarter projects over the years. Median time to deliver is 1 year.
Somehow I've actually gotten every item I backed shipped at some point (which is unexpected).
Hardware startups are _hard_, and after interacting with a number of them (usually one or two people with a neat idea in an underserved market), it seems like more than half fail before delivering their first retail product. Some at least make it through delivering prototypes/crowdfunded boards, but they're already in complete disarray by the end of the shipping/logistics nightmares.
maartin0
Not completely related, but do you know if hardware kickstarters typically have any IP protection? I'm surprised there haven't been any cases of large companies creating patents for ideas from kickstarter at least that I've seen
andy99
I don't know what the intent was but when I read the headline it reminds me of one of those clickbait YouTube headlines, like "I did X and instantly regretted it".
nromiun
There is a reason all the big supercomputers have started using GPUs in the last decade. They are much more efficient. If you want 32bit parallel performance just buy some consumer GPUs and hook them up. If you need 64bit buy some prosumer GPUs like the RTX 6000 Pro and you are done.
Nobody is really building CPU clusters these days.
fidotron
If Pi Clusters were actually cost competitive for performance there would be data centres full of them.
shermantanktop
Like the joke about the economists not picking up the $20 bill on the ground?
Faith in the perfect efficiency of the free market only works out over the long term. In the short term we have a lot of habits that serve as heuristics for doing a good job most of the time.
uncircle
> Like the joke about the economists not picking up the $20 bill on the ground?
For those like me that don't know the joke:
Two economists are walking down the street. One of them says “Look, there’s a twenty-dollar bill on the sidewalk!” The other economist says “No there’s not. If there was, someone would have picked it up already.”
ThrowawayR2
There's been so much investigation into alternative architectures for datacenters and cloud providers, including FAANG resorting to designing their own ARM processors and accelerator chips (e.g. AWS Graviton, Google TPUs) and having them fabbed, that that comes off not as warranted cynicism but silly cynicism.
infecto
Sure but for commodities, like server hardware, we can say it’s usually directionally correct. If there are no pi cloud offerings, there is probably a good economic reason for it.
IAmBroom
> Faith in the perfect efficiency of the free market only works out over the long term
... and even then it doesn't always prove true.
phoronixrly
If they were cost competitive for ... anything at all really...
jacobr1
They are competitive for hobbyist use cases. Limited home servers, or embedded applications that overlap with arduino.
ACCount37
Prototyping and low volume.
They're good for long as the development costs dominate the total costs.
wltr
Well I have a Pi as a home server, and it’s very energy efficient, while doing what I want. Since I don’t need latest and greatest (I don’t see any difference with a modern PC for my use case), it’s very competitive for me. No need for any cooling is bonus.
Waraqa
>very energy efficient
If your server has a lot of idle time, ARM will always win.
null
cosarara
> Compared to the $8,000 Framework Cluster I benchmarked last month, this cluster is about 4 times faster:
Slower. 4 times slower.
teleforce
That's definitely a typo because I've to read the sentence 3 times from the article still cannot make a sense until I saw the figure.
TL;DR, just buy one framework desktop and it's better than the Pi AI cluster of the OP in every single performance metrics including cost, performance, efficiency, headache, etc.
geerlingguy
Oops, fixed the typo! Thanks.
And regarding efficiency, in CPU-bound tasks, the Pi cluster is slightly more efficient. (Even A76 cores on a 16nm node still do well there, depending on the code being run).
lumost
I don’t really get why anyone would be buying ai compute unless A) to your goal is to rent out the compute B) no vendor can rent you enough compute when you need it C) you have an exotic funding arrangement that makes compute capex cheap and opex expensive.
Unless you can keep your compute at 70% average utilization for 5 years - you will never save money purchasing your hardware compared to renting it.
horsawlarway
There are an absolutely stunning number of ways to lose a whole bunch of money very quickly if you're not careful renting compute.
$3,000 is well under many "oopsie billsies" from cloud providers.
And that's outside of the whole "I own it" side of the conversation, where things like latency, control, flexibility, & privacy are all compelling reasons to be willing to spend slightly more.
I still run quite a number of LLM services locally on hardware I bought mid-covid (right around 3k for a dual RTX3090 + 124gb system ram machine).
It's not that much more than you'd spend if you're building a gaming machine anyways, and the nifty thing about hardware I own is that it usually doesn't stop working at the 5 year mark. I have desktops from pre-2008 still running in my basement. 5 year amortization might have the cloud win, but the cloud stops winning long before most hardware dies. Just be careful about watts.
Personally - I don't think pi clusters really make much sense. I love them individually for certain things, and with a management plane like k8s, they're useful little devices to have around. But I definitely wouldn't plan to get good performance from 10 of them in a box. Much better off spending roughly the same money for a single large machine unless you're intentionally trying to learn.
0xbadcafebee
[delayed]
a2128
Why do people buy gaming PCs when it's much cheaper to use streaming platforms? I think the two cases share practically the same parallels in terms of reliability, availability, restrictions, flexibility, sovereignty, privacy, etc.
But also when it comes to Vast/RunPod it can be annoying and genuinely become more expensive if you have to rent 2x the number of hours because you constantly have to upload and download data, checkpoints, continuous storage costs, transfer data to another server because the GPU is no longer available, etc. It's just less of a headache if you have an always available GPU with a hard drive plugged into the machine and that's it
HenryMulligan
Data privacy and security don't matter? My secondhand RTX 3060 would buy a lot of cloud credits, but I don't want tons of highly personal data sent to the cloud. I can't imagine how it would be for healthcare and finance, at least if they properly shepherded their data.
tern
For most people, no, privacy does not matter in this sense, and "security" would only be a relevant term if there was a pre-existing adversarial situation
causal
1) Data proximity (if you have a lot of data, egress fees add up)
2) Hardware optimization (the exact GPU you want may not always be available for some providers)
3) Not subject to price changes
4) Not subject to sudden Terms of Use changes
5) Know exactly who is responsible if something isn't working.
6) Sense of pride and accomplishment + Heating in the winter
justinrubek
At some point, the work has to actually be done rather than shuffling the details off to someone else.
2OEH8eoCRo0
I don't get why anyone would hack on and have fun with unique hardware either /s
seanw444
It's also not always just about fun or cost effectiveness. Taking the infrastructure into your own hands is a nice way to know that you're not being taken advantage of, and you only have yourself to rely on to make the thing work. Freedom and self-reliance, in short.
markx2
> "But if you're on the blog, you're probably not the type to sit through a video anyway. So moving on..."
Thank you!
xnx
Fun project. Was the author hoping for cost effective performance?!
I assumed this was a novelty, like building a RAID array out of floppy drives.
LTL_FTC
The author is a YouTuber and projects like these pay for themselves with the views they garner. Even the title is designed for engagement.
leptons
A lot of people don't understand the performance limits of the Raspberry Pi. It's a great little platform for some things, but it isn't really fit for half the use cases I've seen.
Our_Benefactors
This was my impression as well, the bit about GPU incompatibility with llama.cpp made me think he was in over his head.
hamonrye
Strange quirks for language syntax. I read about a parsing system that was distilled towards being-towards this particular species of silver-tail that existed below Golden Gate bridge. Godwin's Law?
I thought the conclusion should have been obvious: A cluster of Raspberry Pi units is an expensive nerd indulgence for fun, not an actual pathway to high performance compute. I don’t know if anyone building a Pi cluster actually goes into it thinking it’s going to be a cost effective endeavor, do they? Maybe this is just YouTube-style headline writing spilling over to the blog for the clicks.
If your goal is to play with or learn on a cluster of Linux machines, the cost effective way to do it is to buy a desktop consumer CPU, install a hypervisor, and create a lot of VMs. It’s not as satisfying as plugging cables into different Raspberry Pi units and connecting them all together if that’s your thing, but once you’re in the terminal the desktop CPU, RAM, and flexibility of the system will be appreciated.