A Tour Inside the IBM Z17
111 comments
·April 24, 2025mrandish
vb-8448
I encountered assembly programs written and compiled in '88 and still running.
There are several drawbacks to maintaining this kind of compatibility but, nevertheless, it's impressive.
KerrAvon
Book recommendation: in-depth on the people, processes, and technology. Incredible detail on all aspects.
https://direct.mit.edu/books/monograph/4262/IBM-s-360-and-Ea...
mrandish
Thanks for the recommendation! I've ordered it.
After reading the IEEE article I linked above, I got the book the article was based on ("IBM: The Rise and Fall and Reinvention of a Global Icon"). While it's a thorough recounting of IBM's storied history, it wasn't what I was looking for. The author specifically says his focus was not the technical details as he felt too much had been written from that perspective. Instead that book was a more traditional historian's analysis which I found kind of dry.
strongpigeon
I don't mean this in a condescending way at all, but really out of sheer curiosity: Who uses mainframes nowadays and what for?
I find mainframes fascinating, but I'm so unfamiliar with them that I don't know what or why I'd ever use one for (as opposed to "traditional" hardware or cloud services).
pjmlp
Besides all answers given already, one of the reasons Unisys keeps selling Burroughs, aka ClearPath MCP, is its security model.
ESPOL/NEWP is one of the very first systems programming languages, being safe by default, with unsafe code blocks.
The whole OS was designed with security first in mind, think Rust in 1961, thus their customers are companies that take this very seriously, not only running COBOL.
The motto is unsurpassed security.
https://www.unisys.com/product-info-sheet/ecs/clearpath-mast...
dragonwriter
> Who uses mainframes nowadays and what for?
Large institutions (corporations, governments) that have existed more than a couple decades, and have large-scale mission-critical batch processes that run on them, where the core function is relatively consistent over time. Very few, if any, new processes are automated on mainframes most of these places, and even new requirements for the processes that depend on the mainframe may be built in other systems that process data before or after the mainframe workflows, but the cost and risk of replacing the well-known, finely-tuned-by-years of ironing out misbehavior, battle-tested systems often isn't warranted without some large scale requirements change that invalidates the basic premises of the system. So, they stay around.
strongpigeon
Thanks for that and yeah that fits with what I've found, mostly continuation of legacy, critical systems that were built on mainframes. It just seems shocking to me the amount of investments IBM still puts on developing those machines given that no one seems to want to use them anymore?
It feels like I must be missing something, or maybe just underestimating how much money is involved in this legacy business.
wmf
IBM mainframes are extremely profitable. There are ~1,000 customers who cannot migrate off mainframes and are willing to spend almost any amount to keep their apps working. Mainframe profits subsidize all of IBM's other money-losing divisions like cloud.
reaperducer
given that no one seems to want to use them anymore
According to a 2024 Forrester Research report, mainframe use is as large as it's ever been, and expected to continue to grow.
Reasons include (not from the report) hardware reliability and scalability, and the ability to churn through crypto-style math in a fraction of the time/cost of cloud computing.
Report is paywalled, but someone on HN might have a free link.
ianbicking
All these legacy answers don't really make sense for this Z17... it's a new mainframe supporting up to 64T of memory and specialized cores for matrix math/AI applications. I have a hard time believing that legacy workloads are calling for this kind of hardware.
It also has a ton of high availability features and modularity that _does_ fit with legacy workloads and mainframe expectations, so I'm a little unclear who wants to pay for both sets of features in the same package.
wmf
I've heard that the AI features are used by banks for fraud detection. I guess some banks are also growing their transaction volume.
I agree that many mainframe workloads are probably not growing so what used to require a whole machine probably fits in a few cores today.
rbanffy
You won't see mainframes doing AI training, but there is a lot of value in being able to do extremely low-latency inference (which is why they have their NPU on the same chip as the CPUs, just a few clock-cycles from the cores) during on-line transaction processing, and less timing-critical inference work on the dedicated cards (which are a few more nanoseconds away).
flomo
Additionally, IBM marketing likes the implication that mainframe CPUs are 'the best'. If you can buy a phone with an AI processor, it only makes sense that your mainframe must have one too. (And IBM will probably charge you to use it.)
null
bayindirh
If I were a bank, I'd order one of those and put all the financial modelling and prediction load on it. Like real time analysis to approve/deny loans, do projections for deciding what to do in slower moving financial products, predicting some future looking scenarios on wider markets, etc. etc. simultaneously.
That thing is dreadnought matmul machine with some serious uptime, and can crunch numbers without slowing down or losing availability.
or, possibly, you can implement a massively parallel version of WOPR/Joshua on it and let it rip scenarios for you. Just don't connect to live systems (not entirely joking, though).
P.S.: I'd name that version of the Joshua as JS2/WARP.
mhh__
The funny thing is that if they spun out half of the mainframe thing into something they could compete with people might actually buy them.
Most firms have so-so software, in need of ultra reliable hardware, not everyone is google
sho_hn
> I don't mean this in a condescending way at all, but really out of sheer curiosity: Who uses mainframes nowadays and what for?
There's probably some minor strategic relevance here. E.g. for the government which has some workloads (research labs, etc.) that suit these systems, it's probably a decent idea not to try and migrate over to differently-shaped compute just to keep this CPU IP and its dev teams alive at IBM, to make sure that the US stays capable of whipping up high-performance CPU core designs even if Intel/AMD/Apple falter.
dralley
Those customers don't use mainframe, they use POWER. There's been a handful of POWER supercomputers in the past decade built for essentially that reason.
rbanffy
POWER is not uncommon in HPC, but IBMi (which is very enterprisey) is also based on POWER. You won't find IBM mainframes in HPC, but that's because HPC is not as sensitive for latency and reliably than online transaction processing, and, with mainframes, you are paying for that, not for TFLOPS.
bronco21016
I understand that a company I work with uses a few, and is migrating away from them.
It seems clear to me that prior to robust systems for orchestrating across multiple servers that you would install a mainframe to handle massive transactional workloads.
What I can never seem to wrap my head around is if there are still applications out there in typical business settings where a massive machine like this is still a technical requirement of applications/processes or if it's just because the costs of switching are monumental.
I'd love to understand as well!
rbanffy
Bank payment processing is the primary example - being able to tell if a specific transaction is or not fraudulent in less than 100 milliseconds - but there are other businesses with similar requirements. Healthcare is one of them, and fraud detection is getting a lot more sophisticated with the on-chip NPUs within the same latency constraints.
beAbU
Cloud is basically an infinitely scalable mainframe. You have dedicated compute optimised for specific workloads, and you string these together in a configuration that makes sense for your requirements.
If you understand the benefits of cloud over generic x86 compute, then you understand mainframes.
Cloud is mainframes gone full circle.
rbanffy
> Cloud is mainframes gone full circle.
Except that now you need to develop the software that gives mainframes their famed reliability yourself. The applications are very different: software developed for cloud always needs to know that part of the system might become unavailable and work around that. A lot of the stack, from the cluster management ensuring a failed node gets replaced and the processes running on them are spun up on another node, all the way up to your code that retries failed operations, needs to be there if you aim for highly reliable apps. With mainframes, you just pretend the computer is perfect and never fails (some exaggeration here, but not much).
Also, reliability is just one aspect - another impressive feature is their observability features. Mainframes used to be the cloud back then and you can trace resource usage with exquisite detail, because we used to bill clients by CPU cycle. Add to that the hardware reliability features built-in (for instance, IBM mainframes have memory in RAID-like arrays).
cameron_b
But latency
The cache design in the Z is so different from cloud computing for collaborative job processes.
ThinkBeat
Does IBM mainframes still have the pricing model where you "buy" hardware and then still pay IBM for main processing?
(Where you can save money buying Linux or Java accelerators to run things on for free
bob1029
I think what you are referring to is the "sub capacity" pricing model wherein a rolling average of resource consumption is used for billing. They've transitioned to newer models circa cloud technology, but it's mostly the same idea with more moving parts.
The advantage of this model from a business operations standpoint is that you don't have to think about a single piece of hardware related to the mainframe. IBM will come out automagically and fix issues for you when the mainframe phones home about a problem. A majority of the system architecture is designed on purpose to enable seamless servicing of the machine without impacting operations.
DaiPlusPlus
> IBM will come out automagically and fix issues for you when the mainframe phones home about a problem. A majority of the system architecture is designed on purpose to enable seamless servicing of the machine without impacting
I'd rather have a fault-tolerant distributed software system running on commodity hardware, that way there's a plurality of hardware and support vendors to choose from. No lock-in.
rbanffy
> I'd rather have a fault-tolerant distributed software system running on commodity hardware, that way there's a plurality of hardware and support vendors to choose from. No lock-in.
But then you'd have to develop it yourself. IBM has been doing just that for 60 years (on the 360 and its descendants).
bob1029
> distributed software system
What if the business demands a certain level of serialized transaction throughput that is incompatible with ideas like paxos?
You will never beat one fast machine at a serialized narrative, and it just so happens that most serious businesses require these semantics.
How much does downtime cost you per hour? What are the consequences if your services become unavailable?
anthk
That's like toy drone company trying to compete with DARPA. Not even close.
These kinds of monsters run under critical environments such as airports, with AS400 or similar terminals being used by secretaries. These kind of workloads, reliability, security, testing, are no joke. At all. This is not your general purpose Unix machine.
charlieo88
How much does it cost? I'm just curious. No, I don't want to book a meeting to "discuss" it.
pickle-wizard
I haven't worked with mainframes since the z10, but back then you could get into an entry model for about $100k.
Though the sky is the limit. The typical machine I would order had a list price of about 1 million. Of course no one pays list. Discounts can be pretty substantial depending on how much business you do with IBM or how bad they want to get your business.
bigbuppo
The big problem is that everything in IBM-z world is negotiated, and often covered by NDAs. The pricing is complicated by which operating systems and what sort of workloads you'll be running, and what service level guarantees you need. The only published pricing in the entire life of the IBM 360/370/390/z-series line was the Linux One when it was first released... Hardware plus OS, excluding storage, was $70k on the low end.
Previous generation machines that came off-lease used to be listed on IBM's web site. You could have a fully-maxed-out previous-generation machine for under $250k. Fifteen years ago I was able to get ballpark pricing for a fully-maxed-out new machine, and it was "over a million, but less than two million, and closer to the low end". That being said, the machines are often leased.
If you go with z/vm or z/vse, the OS and softare is typically sold under terms that are pretty much like normal software, except it varies depending on the capacity level of the machine, which may be less than the physical number of CPUs in the machine, since that is a thing in IBM-land.
If you go for z/os, welcome to the world of metered billing. You're looking at tens of thousands of dollars in MRC just to get started, and if you're running the exact wrong mix of everything, you'll be spending millions just on software each month. There's a whole industry that revolves around managing these expenses. Still less complicated than The Cloud.
NelsonMinar
You can get a software emulator for free and run it on a PC! It's quite robust and used by IBM's own developers. https://en.wikipedia.org/wiki/Hercules_(emulator)
racingmars
Hercules is _not_ used by IBM's own developers. Being found with Hercules on your computer at IBM gets you in trouble. I know people who work on mainframe-related stuff inside IBM and they steer well clear of Hercules. And I've heard that IBM's computer monitoring stuff (antivirus, asset protection, etc.) looks for Hercules and flags it.
But IBM _does_ have their own mainframe emulator, zPDT (z Personal Development Tool), sold to their customers for dev and testing (under the name zD&T -- z Development and Test), and to ISVs under their ISV program. That's what IBM's own developers would be using if they're doing stuff under emulation instead of LPARs on real hardware.
(And IBM's emulator is significantly faster than Hercules, FWIW, but overall less feature-full and lacks all of the support Hercules has for older architectures, more device types, etc.)
rbanffy
> looks for Hercules and flags it.
There was some of a legal fight between IBM and Turbo Hercules SSA, a company that tried to force IBM to license z/OS to their users. IBM has been holding a grudge ever since (probably at the advice of their legal).
null
tw04
You don't buy a mainframe, it's consumption based pricing. They aren't just going to list a price, because they need to size the hardware to what they think the workload will be.
Could they just list prices? Sure. Will they ever do it? No.
rbanffy
A Rockhopper 4 Express starts at $135,000. While technically a mainframe, it won't run z/OS.
null
timewizard
It depends on how full those drawers are. $250k to $1m would be the typical price range.
It's easier and harder at the same time to buy older hardware. That's half the challenge though because the software is strictly licensed and you pay per MIPS.
Here's a kid who bought a mainframe and then brought it up:
zcdziura
If you have to ask, you can't afford it!
mrweasel
I'm completely fascinated by the diagram. In a four rack system, 2.5 rack is dedicated to I/O, half a rack is just empty and the remaining is the actual processing and memory.
The I/O probably isn't endless networking adaptors, so what is it?
Someone
https://www.redbooks.ibm.com/abstracts/sg248579.html:
“The IBM z17 supports a PCIe I/O infrastructure. PCIe features are installed in PCIe+ I/O drawers. Up to 12 I/O drawers per IBM z17 can be ordered, which allows for up to 192 PCIe I/O and special purpose features.
For a four CPC drawer system, up to 48 PCIe+ fan-out slots can be populated with fan-out cards for data communications between the CPC drawers and the I/O infrastructure, and for coupling. The multiple channel subsystem (CSS) architecture allows up to six CSSs, each with 256 channels.
The IBM z17 implements PCIe Generation 5 (PCIe+ Gen5), which is used to connect the PCIe Generation 4 (PCIe+ Gen4) dual port fan-out features in the CPC drawers. The I/O infrastructure is designed to reduce processor usage and I/O latency, and provide increased throughput and availability.”
bigbuppo
There's also the problem in that they need to take into account floor loading. They're not going to tell a customer upgrading from an older machine to a new one that, "oh, by the way, the rack weighs twice what it used to, so you'll need to upgrade your floor while you're at it." Especially important for raised floors.
null
bitwize
Probably channels. In an IBM mainframe, each I/O device is connected on its own channel, which is actually a separate computer that handles the transfer to/from the main CPU. This has been the case going back to the System/360, which is why mainframes are legendary for their transaction throughput and reliability. There's probably a lot of redundancy in the I/O hardware, as they have to be rock solid and hot swappable while the system is running.
stonogo
Could be storage, networking, crypto HSM, or cluster interconnect. See page 28 on https://www.redbooks.ibm.com/redbooks/pdfs/sg248580.pdf
pkaye
I always enjoy reading those IBM Redbooks and learning about the technical details of these mainframe systems.
gtirloni
z Systems have always been amazing engineering feats. Too bad adopting it comes with a gargantuan amount of... IBM.
gosub100
Yeah a few years ago there was a Talos (I think) desktop motherboard that had a POWER 8 cpu in it. It was expensive due to low production runs but I wish it had taken off. I think IBM is up to power 9 now, but I doubt if there are any personal motherboards for it.
Pet_Ant
> IBM is up to power 9 now, but I doubt if there are any personal motherboards for it.
The Talos II:
https://wiki.raptorcs.com/wiki/Talos_II
> EATX form factor > Two POWER9-compatible CPU sockets accepting 4-/8-/18- or 22-core Sforza CPUs
"Entry" level is $5,800 USD.
There won't be a POWER10 version from them because of proprietary bits required
https://www.talospace.com/2023/10/the-next-raptor-openpower-...
> POWER10, however, contained closed firmware for the off-chip OMI DRAM bridge and on-chip PPE I/O processor, which meant that the principled team at Raptor resolutely said no to building POWER10 workstations, even though they wanted to.
https://www.osnews.com/story/137555/ibm-hints-at-power11-hop...
alchemist1e9
What are some of the reasons to buy or use these over Intel or AMD?
9front
That's Raptor Computing Systems (www.raptorcs.com) now selling Talos II workstations.
UltraSane
I really don't understand why in 4 racks you can have only 4 CPU drawers and 12 I/O drawers. This seems like their IO is incredibly inefficient.
rbanffy
Have you seen the size of those drawers? A single rack can only fit five of them, and you still would need to add processing, power/UPS, and cooling.
UltraSane
The CPU drawers are 5U while the I/O drawers are 8U. https://higherlogicdownload.s3.amazonaws.com/IMWUC/UploadedI...
I recently learned that the IBM Z series mainframes are generally compatible with software written for the legendary IBM 360 launched in 1964. While I'm sure there are caveats, maintaining any backward compatibility with a platform from over 60 years ago is impressive.
Having started in 8-bit microcomputers and progressing to various desktop platforms and servers, mainframes were esoteric hulking beasts that were fascinating but remained mysterious to me. In recent years I've started expanding my appreciation of classic mainframes and minis through reading blogs and retro history. This IEEE retrospective on the creation of the IBM 360 was eye-opening. https://spectrum.ieee.org/building-the-system360-mainframe-n...
Having read pretty deeply on the evolution of early computers from the ENIAC era through Whirlwind, CDC, early Cray and DEC, I was familiar with the broad strokes but I never fully appreciated how much the IBM 360 was a major step change in both software and hardware architecture. It's also a dramatic story because it's rare for a decades-old company as successful and massive as IBM to take such a huge "bet the company" risk. The sheer size and breadth of the 360 effort as well as its long-term success profoundly impacted the future of computing. It's interesting seeing how architectural concepts from the 360 (as well as DEC's popular PDP-8, 10 and 11) went on to influence the design of early CPUs and microcomputers. The engineers and hobbyists creating early micros had learned computers in the late 60s and early 70s mostly on the 360s and PDPs which were ubiquitous in universities.