Skip to content(if available)orjump to list(if available)

IBM Announces the Z17 Mainframe Powered by Telum II Processors

llm_nerd

For those who wonder where this fits relative to platforms we are usually more accustomed to, they sell these to banks and financial services that have a long history of mainframes and are basically just upgrading in place within minimal change. With recent iterations they've added some AI processing on the silicon, offering baby steps to imbuing solutions like fraud detection with neural nets on chip.

But to put it in context, the 24 TOPS that they advertise -- the inference performance of their AI module on their Telum II -- doesn't even match an M4's neural engine (40 TOPS). And of course compared to a dedicated device like an H100 SXM that can hit 4000 TOPS (yes, 166x more). Of course for both the M4 and H100 chip I'm giving quantized numbers, but presumably the Telum II is as it boasts about its quantized support.

Massive caches. Tonnes of memory support. Neat device. A "you won't get fired for leasing this" solution for a few of the Fortune 500.

But you can almost certainly build a magnitudes faster device in just about every dimension using more traditional hardware stacks and for a fraction of the price.

claudex

It's 24 TOPS per CPU, so 192 TOPS for a full mainframe. And you can add Spyre accelerator which have 32 AI unit per card (so 32x24 TOPS) and up to 48 cards per mainframe (so 48x32x24 = 36 864 TOPS). But yeah, you could buy a lot of H100 for the price of a mainframe with such a configuration.

rswail

> But you can almost certainly build a magnitudes faster device in just about every dimension using more traditional hardware stacks and for a fraction of the price.

Using the sentence "more traditional hardware stacks" when comparing to mainframes is sorta funny.

UltraSane

IBM strictly forbids anyone from publishing performance benchmarks of their mainframes which should tell you that they are pretty slow for the money.

hulitu

> which should tell you that they are pretty slow for the money.

It depends. Some years ago, the memory was working at half the bus speed.

UltraSane

If IBM mainframes had amazing MIPS/$ IBM would make sure the world knows it.

CuriousRose

Being 50% faster (article based guess-timate) what’s the average time that a mainframe stays in the fleet? Surely significant energy and data centre floor space savings might even capture some customers from the previous generation to upgrade? Or is it a 5-10 year cycle?

trollbridge

Mainframes are often leased, and with leasing charges based on how many CPU cores are used (regardless of how many are physically present).

rbanffy

One cool feature of the z16 was that all cores can be activated during startup, and then all cores that aren't licensed (paid for) shut down and the performance drops to the agreed limit. I do that with my KVM machines on the server under my desk - all are configured with more cores than needed and, when boot finishes, most cores are removed from process scheduling leaving just the amount the VM is supposed to use during normal operation. The result is a significantly faster start up and significantly lower power consumption (and resource contention).

dijit

Blog post worthy.

I wasn’t even aware you could tell the kernel not to schedule on some cores.

Yet, its as easy as:

    echo 0 > /sys/devices/system/cpu/cpu{num}/online

anonfordays

Your comment isn't clear as to why. If you assign two vCPUs to a KVM VM, why would limiting the VM's kernel to only scheduling on one vCPU increase the speed? The vCPUs are typically all on one physical CPU anyway, unlike a mainframe. You can also set affinity/processor pinning in KVM if you have more than one physical CPU/core, with the VM being none the wiser.

vb-8448

It depends on IBMs sales target.

Some customer may change it even every 3-5 years, but most i saw were between 5 and 7.

Energy and data centre floor space are not a real thing here, the mainframe usage of physical resource is nothing compared to other systems.

rbanffy

I think that most z17's sold are already planned acquisitions, from either decommissioning older machines, capacity expansion, or workload consolidation (say, a 4-node z13 parallel sysplex moving to a 2-node z17). I don't think anyone in their right mind would run a machine like this (for business-critical apps) for more than the extended warranty.

Spooky23

Closer to 5 usually. You’re leasing MIPS when you buy these things, so there’s a lot of margin and room for finance games.

It was a crazy business. You’d buy services and other crap in your mainframe deal, but the consultants are really sales guys who have their nose in your tent. Fully captured businesses in insurance and government especially would even use IBMs brand alignment for computers.

I had to close out a deal with them at a past employer when my peer had a medical emergency. It was a very eye opening experience.

AnimalMuppet

> would even use IBMs brand alignment for computers.

For those of us not in that world, could you explain what that means?

And, could you give some details on your last paragraph?

rbanffy

Based on the Telum II chip shown at last year's Hot Chips conference.

https://chipsandcheese.com/p/telum-ii-at-hot-chips-2024-main...

theandrewbailey

> IBM’s solution is to reduce data duplication within its caches, and re-use the chip’s massive L2 capacity to create a virtual L3 cache.

I was kind of fascinated with the original Telum CPUs because of this feature. Do we know if other CPU designers are planning a similar virtual cache feature? Or is this feature implemented as a workaround due to some other deficiency that others don't have?

UltraSane

It is needed for the Telum chips because they have so much L2 cache per core (36MB) there isn't room for any L3 cache

rbanffy

They could reduce core count to make space for on-chip L3, and add a massive amount of cache to the drawer controller, like the was on the z15, but this is way more efficient. They also keep the in-drawer and cross-socket latencies very low, forming a virtual L4 with all chips in the drawer. In the end, it's a more efficient way to allocate silicon.

memset

I interned at IBM writing mainframe software in 2008 or so. One thing I remember them saying - there used to be TV commercials - that a single mainframe could replace a rooms worth of commodity hardware.

I would have assumed that someone would have started a cloud provider with Linux VMs running on mainframes instead of racks of pizza boxes. What was missing - are the economics of mainframes really that bad?

ASalazarMX

Mainframes are the polar opposite of commodity hardware. Those pizza boxes are commodity because they're plentifully available, and you can mix-and-match if needed, there's nothing cheaper for you to run your VMs on. Running them on a mainframe would put IBM as a middle man between you and your business.

Also, mainframes/midranges are all about stability, security, and robustness, not performance. For example, IBM i (the OS, not the machine) has a hardware dependent layer, and a hardware-independent one. This allows for drastic hardware changes without affecting the high-level applications. A monolith would be arguably more efficient, but it matters more that the hardware-independent layer stays rock-solid.

https://en.wikipedia.org/wiki/IBM_i#Architecture

pabs3

IBM offer a cloud to open source folks already, not sure if they have a commercial one though.

https://developer.ibm.com/articles/get-started-with-ibm-linu... https://wiki.debian.org/Hardware/Wanted#Other

pabs3

Where does IBM get their CPUs fabbed? I assume they don't have their own fabs these days?

Edit: its "Samsung’s leading edge 5 nm process node" according to the Telum II article linked from one of the other comments here.

https://chipsandcheese.com/p/telum-ii-at-hot-chips-2024-main...

Surprising they aren't using TSMC like pretty much everyone else does.

sillywalk

A link to the IBM Redbook Technical Introduction [PDF]:

https://www.redbooks.ibm.com/redbooks/pdfs/sg248580.pdf

a012

It’s funny that they managed to slap AI buzz into the mainframe

myself248

It's a real thing -- one of the oft-cited usage examples is running fraud detection while processing credit-card transactions. Customers want to add this powerful capability to that bedrock business process.

xattt

What are the heuristics that AI models would use for a given transaction? Is it essentially just a “vibe check”?

DougN7

I’ve been learning a bit about it after learning about RRCF (Robust Random Cut Forests). The ISOTree library is really well documented if you want to play with it.

spicybbq

They use machine learning, pattern matching on what "good" and "bad" requests look like, based on experience.

UltraSane

complex Bayesian statistical models updated in real-time.

skyyler

AI feels really fuzzy compared to the types of work I normally associate with big iron.

gosub100

You can detect fraud using non-AI solutions, but this way if it does something racist or shuts off an innocent person's access to money, they can shrug and blame it on the AI, then announce immediate "comprehensive measures" to prevent it in the future.

UltraSane

They started this back in 2020 at least so it predates the LLM hype. They intended it to be used to apply compute expensive fraud detection algorithms to every transaction.

NitpickLawyer

IBM - mainframe - Zxx CPUs, what decade is this?!

cess11

Most organisations never come across a problem that requires neatly clustered high-end hardware.

"[T]he IBM Telum II processors are eight cores running at 5.5GHz and with a 360MB virtual L3 cache and 2.88GB virtual L4 cache. Telum II brings a new data processing unit, I/O advancements, and other features to help accelerate AI workloads."

But if you do, if you actually have such massive data streams and low tolerance for latencies that sharding them over many machines costs a lot in overhead and throughput slows down intolerably at load peaks, then these machines are most likely a bargain. They allow you to do things very few other can do, resulting in a moat around your business and locks you in with IBM.

Or you've been around since the seventies and your software largely consists of COBOL, Fortran and assembly for this line of architectures, and it would cost you two decades of the great rewrite while all your developers do very little else, then it's also a bargain to be able to stay in business.

rbanffy

The reasoning is like a choice between buying a single box for 10 million, or buying a hundred box cluster for 2 million, then spend 10 million carefully tuning all your software to run your workload on that 2 million cluster.

rubyfan

One where legacy businesses still use the cobol programs developed in prior decades.

I recently had to wait two quarters to launch a product because the only person who knew what some cobol accounting program did was out on leave.

This is one of the many reasons many big corporations fail to innovate. It is very hard (near impossible) to implement new systems in an environment dominated by old ones (not talking only about software and hardware here, also organizational dynamics).

spratzt

You’re absolutely right about the organizational dynamics.

Many mangers in large companies, derive their status and power from a knowledge of existing business processes and procedures. Any substantive changes to those procedures obviously represent an existential threat to that position and they generally resist it, often very vigorously.

Koshkin

I don’t think you necessarily need a mainframe in order to run a program written in COBOL. (There’s also emulation available, to accommodate worst case scenarios.)

jabl

There's even a COBOL frontend in GCC these days!

That being said, the problem isn't so much the COBOL language itself but rather that all the software written in it is connected to all kinds of database system, messaging systems(?) and whatnot, making it very hard to move to some non-mainframe platform even if a customer so chooses. Or to put it another way, it's cheaper to pay even the very high prices to IBM to keep on the mainframe track rather than to migrate.

khaledh

One where banks, credit card networks, insurance companies, government, etc. need to process millions of transactions per second on a single mission critical box with unmatched redundancy, resilience, and security, and with a support army backing it.

hagbard_c

The one where the chips used in those mainframes have the edge on those used in servers in some important ways - more cache, higher cache bandwidth, lower cache latency. Those developments will eventually make their way into server chips and with a bit of luck IBM (et al) will have developed something else by that time which will eventually enter mainstream.

rbanffy

One where no other company has pushed out a processor that can borrow cache from neighbors.

null

[deleted]

speed_spread

It's the decade where it all comes crashing down and you're going back to COBOL.

Koshkin

COBOL? If such time comes, we’d be lucky to even need the abacus.

speed_spread

If we're leaving civilisation altogether, an abacus is just an ammo clip for a slingshot.

Woodi

Nope.

How many times companies was forced to pay for MS Windows, OEM or not ?

That COBOL apps and os'es used there are all already payd production quality software from XX dot 50's and all that is needed, sometimes, is to put in new piece of hardware. That is totally other civilisation.

Ok, I may be colorizing but this is a general concept.

Don't mention 100 years of "software modifications" because windows ecosystem can't win that contest anyway.

null

[deleted]