Skip to content(if available)orjump to list(if available)

PCIe 8.0 announced by the PCI-Sig will double throughput again

SlightlyLeftPad

Any EEs that can comment on at what point do we just flip the architecture over so the GPU pcb is the motherboard and the cpu/memory lives on a PCIe slot? It seems like that would also have some power delivery advantages.

kvemkon

> at what point do we just flip the architecture over so the GPU pcb is the motherboard and the cpu/memory

Actually the RapsberryPi (appeared 2012) was based on a SoC with a big and powerful GPU and small weak supporting CPU. The board booted the GPU first.

verall

If you look at a any of the nvidia DGX boards it's already pretty close.

PCIe is a standard/commodity so that multiple vendors can compete and customers can save money. But at 8.0 speeds I'm not sure how many vendors will really be supplying, there's already only a few doing serdes this fast...

y1n0

There are companies that specialize in memory controller ip that every one else uses, including large semi companies like Intel.

The ip companies are the first to support new standards, make their money selling to intel etc. Allowing intel or whomever to take their time to build higher performance ip.

bgnn

These days you can buy any standard as a soft IP from Synopsys or Cadence. They take their previous serdes and modify it to meet the new standard. They have thousands of employees across the globe just doing that.

Melatonic

Isnt it about latency as well with DGX boards? Vs PCI-E. You can only fit so much RAM on a board that will realistically be plugging into a slot

snerbles

Most current DGX server assemblies are stacked and compression-fit, much higher density and more amenable to liquid cooling.

https://www.servethehome.com/micron-socamm-memory-powers-nex...

eggsome

Has the DGX actually shipped anywhere yet?

verall

Do you mean the new one? The older ones have been around for so long you can buy off-leases of them: https://www.etb-tech.com/nvidia-dgx-1-ai-gpu-server-2-x-e5-2...

vincheezel

Good to see I’m not the only person that’s been thinking about this. Wedging gargantuan GPUs onto boards and into cases, sometimes needing support struts even, and pumping hundreds of watts through a power cable makes little sense to me. The CPU, RAM, these should be modules or cards on the GPU. Imagine that! CPU cards might be back..

ksec

It is not like CPU aren't getting higher wattage as well. Both AMD and Intel have roadmap for 800W CPU.

At 50-100W for IO, this only leaves 11W per Core on a 64 Core CPU.

linotype

800 watt CPU with a 600 watt GPU, I mean at a certain point people are going to need different wiring for outlets right?

derefr

But all of the most-ridiculous hyperscale deployments, where bandwidth + latency most matter, have multiple GPUs per CPU, with the CPU responsible for splitting/packing/scheduling models and inference workloads across its own direct-attached GPUs, providing the network the abstraction of a single GPU with more (NUMA) VRAM than is possible for any single physical GPU to have.

How do you do that, if each GPU expects to be its own backplane? One CPU daughterboard per GPU, and then the CPU daughterboards get SLIed together into one big CPU using NVLink? :P

wmf

GPU as motherboard really only makes sense for gaming PCs. Even there SXM might be easier.

db48x

No, for a gaming computer what we need is the motherboard and gpu to be side by side. That way the heat sinks for the CPU and GPU have similar amounts of space available.

For other use cases like GPU servers it is better to have many GPUs for every CPU, so plugging a CPU card into the GPU doesn’t make much sense there either.

mensetmanusman

It’s always going to be a back and forth on how you attach stuff.

Maybe the GPU becomes the motherboard and the CPU plugs into it.

avgeek23

And the memory should be a onboard module on the cpu card intel/amd should replicate what apple did with a unified same ringbus sort of memory modules. Lower latency,higher throughput.

Would push performance further. Although companies like intel would bleed the consumer dry with, a certain i5-whatever cpu with onboard memory of 16 gigs could be insanely priced compared to what you'd pay for addon memory.

0x457

That would pretty much make both intel and amd to start market segmentation by CPU Core + Memory combination. I absolutely do not want that.

0manrho

We're already there. That's what a lot of people are using DPU's are for.

An example, This is storage instead of GPU's, but as the SSD's were PCIe NVMe, it's pretty nearly the same concept: https://www.servethehome.com/zfs-without-a-server-using-the-...

undersuit

To continue the ServeAtHome links, https://www.servethehome.com/microchip-adaptec-smartraid-430...

PCI-e Networks and CXL are the future of many platforms... like ISA backplanes.

0manrho

Yep, I have a lot of experience with CXL devices and networked PCIe/NVMe (over Eth/IB) Fabrics and deploying "Headless"/"Micro-Head" compute units which are essentially just a pair of DPU's on a PCIe multiplexer (basically just a bunch of PCIe slots tied to a PCIe Switch or two).

That said my experience in this field is more with storage than GPU compute, but I have done some limited hacking about in the GPGPU space with that tech as well. Really fascinating stuff (and often hard to keep up with and making sure every part in the chain supports the features you want to leverage, not to mention going down the PCIe root topology rabbit hole and dealing with latency/trace-length/SnR issues with retimers vs muxers vs etc etc etc).

It's still a nascent field that's very expensive to play in, but I agree it's the future of at least part of the data infrastructure field.

Really looking forward to finally getting my hands on CXL3.x stuff (outside of a demo environment.)

bgnn

EE here. There's no reason to not deliver power directly to the GPU by using cables. I'm not sure if it's sooving anything.

But you are right, there's no hierarchy in the systems anymore. Why do we even call something a motherboard? There's a bunch of chips interconnected.

pshirshov

Can I just have a backplane? Pretty please?

theandrewbailey

I've wondered why there hasn't been a desktop with a CPU+RAM card that slots into a PCIe x32 slot (if such a thing could exist), or maybe dual x16 slots, and the motherboard could be a dumb backplane that only connected the other slots and distributed power, and probably be much smaller.

namibj

Those exist; they are used for risers ("vertical mount GPU brackets, for dual GPU" equivalent for servers, where they make the cards flat again).

KeplerBoy

PCIe x 32 actually exists, at least in the specification. I have never seen a picture of a part using it.

iszomer

Retimers.

colejohnson66

Sockets (and especially backplanes) are absolutely atrocious for signal integrity.

pshirshov

I guess if it's possible to have 30cm PCIe 5 riser cables, it should be possible to have a backplane with traces of similar length.

vFunct

VMEBus for the win! (now VPX...)

guerrilla

Yes, for fucks sake, this is the only way forward. It gives us the ultimate freedom to do whatever we want in the future. Just make everything a card on the bus and quit with all this hierarchy nonsense.

dylan604

Wouldn't that mean an complete mobo replacement to upgrade the GPU? GPU upgrades seem much more rapid and substantial compared to CPU/RAM. Each upgrade would now mean taking out the CPU/RAM and other cards vs just replacing the GPU

p1esk

GPUs completely dominate the cost of a server, so a GPU upgrade typically means new servers.

BobbyTables2

Agree - newer GPU likely will need faster PCIe speeds too.

Kinda like RAM - almost useless in terms of “upgrade” if one waits a few years. (Seems like DDR4 didn’t last long!)

chrismorgan

> GPU upgrades seem much more rapid and substantial compared to CPU/RAM.

I feel like I’ve been hearing about people selling five-to-ten-year-old GPUs for sometimes as many dollars as they bought them for, for the last five years; and people choosing to stay on 10-series NVIDIA cards (2016) because the similar-cost RTX 30-, 40- or 50-series was actually worse, because they’d been putting the effort and expense into parts of the chips no one actually used. Dunno, I don’t dGPU.

MurkyLabs

Yes I agree, let's bring back the SECC style CPU's from the Pentium Era, I've still got my Pentium II (with MMX technology)

bhouston

I love the PCIe standard is 3 generations ahead of what is actually released. Gen5 is the live version, but the team behind it is so well organized that they have a roadmap of 3 additional versions now. Love it.

zamadatix

"3 generations" seems like a bit of a stretch. Millions of Blackwell systems use PCIe 6.x today, PCIe 7.x was finalized last month, and this is an announcement work on PCIe 8.0 has started for release in 3 years. I.e. it has only been one month of being one generation behind the latest PCIe revision.

It'll be interesting if consumer devices bother trying to stay with the latest at all anymore. It's already extremely difficult to justify the cost of implementing PCIe 5.0 when it makes almost no difference for consumer use cases. The best consumer use case so far is enthusiasts who want really fast NVMe SSDs in x4 lanes, but 5.0 already gives >10 GB/s for a single drive, even with the limited lane count. It makes very little difference for x16 GPUs, even with the 5090. Things always creep up over time, but the rate at which the consumer space creeps is just so vastly different from what the DC space has been seeing that it seems unreasonable to expect the two to be lockstep anymore.

_zoltan_

either I'm misinformed in which case is like to see proof, or you are, but GB200 is not Gen6, just Gen5 and this will only be rectified with GB300 which, while exists in the wild, is not in the millions.

so indeed the parent commenter would be correct that everything is Gen5 right now.

zamadatix

I'm going off of https://developer.nvidia.com/blog/nvidia-gb200-nvl72-deliver... but I have not actually deployed a GB200 system yet myself. Consumer Blackwell is definitely 5.0 though.

Melatonic

CPUs themselves only have so many PCI-E lanes though right? Wouldnt it make sense (even for consumers) to have peripherals using less lanes (but more speed per lane) for a multi GPU system or something that uses a lot of drives?

zamadatix

More lanes = more cost

Faster lanes = more cost

More faster lanes = lots more cost

The chipset also strikes some of the balance for consumers though. It has a narrow high speed connection to the CPU but enables many lower speed devices to share that bandwidth. That way you can have your spare NVMe drive, SATA controller, wired and wireless NICs, sound hardware, most of your USB ports, your capture card, and some other random things connected over a single x4 to x8 sized channel. This leaves the high cost lanes for just the devices that actually use them (GPU and primary, possibly secondary, storage drive). I've got one consumer type Motherboard with 14 NVMe drives connected, for example, just not at full native speed directly to the CPU.

You're just SoL if you want to connect a bunch of really high bandwidth devices simultaneously (100 Gbps+ NICs, multiple GPUs at full connection speed, a dozen NVMe drives at native speed, or similar) because then you'll be paying for a workstation/server class platform which did make the "more faster lanes" tradeoff (plus some market segment gouging, of course).

michaelt

> "3 generations" seems like a bit of a stretch. Millions of Blackwell systems use PCIe 6.x today

True. And yet, if you buy an RTX 5090 today, costing $2400 and released in January this year, it's PCIe 5.0 x16

zamadatix

Does the second half not already talk to this? Adding on to it further, GN found a few % performance difference on a 5090 going from PCIe 3.0 to 5.0 https://gamersnexus.net/gpus/nvidia-rtx-5090-pcie-50-vs-40-v... meanwhile many DC use cases find PCIe 5.0 unusably slow, driving alternative (non-PCIe) interconnects.

tails4e

It takes a long time to get form standard to silicon, so I bet there are design teams working on pcie7 right now, which won't see products for 2 or more years

Seattle3503

Is there an advantage of getting so far ahead of implementations? It seems like it would be more difficult to incorporate lessons.

kvemkon

When AMD introduces a new Desktop CPU series IIRC they claim the next generation design is (almost) finished (including layout?) and they start with the next-next-gen design. And I'm also asking the same question. But more than a half a year before the CPU becomes available to the public it is already being tested by partners (mainboard manufacturers and ?).

ThatMedicIsASpy

Gen6 is in use look at Nvidia ConnectX-8

drewg123

What hosts support Gen6? AFAIK, Gen5 is the most recent standard that's actually deployed. Eg, what can you plug a CX8 into that will link up at Gen6?

_zoltan_

into a Gen6 Bianca board (baseboard for GB300). that's it today.

how you use those today is twofold:

- Gen5 x32 via two x16 slots (this is how most people use them)

- use actually the CX8 as your PCIe switch directly to your GPUs: https://www.servethehome.com/this-is-the-nvidia-mgx-pcie-swi...

ksec

Nvidia only for now. Precisely because they got fed up with Intel and AMD ( And Amphere as well ) for extremely slow PCIe adoption.

I believe only next Gen Intel and AMD Zen6 will get PCIe 6.0.

I am hoping Nvidia officially move into Server CPU market not only for their CPU but for wider Web Hosting as well. More competition for Server Hardware.

my123

Blackwell DC (B200/B300)

triknomeister

Custom Nvidia network cards I guess.

jsolson

This actually makes sense from a spec perspective if you want to give enough to allow hardware to catch up with the specs and to support true interop.

Contrast this with the wild west that is "Ethernet" where it's extremely common for speeds to track well ahead of specs and where interop is, at best, "exciting."

Phelinofist

So we can skip 6 and 7 and go directly to 8, right?

robotnikman

I know very little about electronics design, so I always find it amazing that they keep managing to double PCIe throughput over and over. Its also probably the longest lived expansion bus at the moment.

wmf

It's less surprising if you realize that PCIe is behind Ethernet (per lane).

extraduder_ire

Which Ethernet standard? Light online searching leads me to believe it tops out at 800Gb/s. (and not in a format I would normally call Ethernet)

toast0

Look at throughput on a single lane from the MAC to the PHY.

802.3dj is maybe finishing up soon and has 200gbps lanes. Which is more or less about what pci-e 8.x is supposed to be. The table in the article sums both directions of a lane, which leads to confusion. People want faster ethernet in fewer lanes, so no doubt a 400gbps per lane standard will be starting up soon for pci-e to leverage as pci-e 9.

> not in a format I would normally call Ethernet

Why, because it doesn't use ether, or vampire taps?

Ethernet works over many media, and that's been pretty consistent throughout its life. It started from or at least was inspired by AlohaNet a wireless system, then you had thick and thin coax, then twisted pair and fiber, now twinax and board level interfaces are common, too.

wmf

Ethernet has 100 Gbps per lane today and soon will have 200 Gbps while PCIe has 32 GBps today and soon will have 64 Gbps per lane.

rbanffy

I’m sure you can get some VMEbus boards.

zkms

My reaction to PCIe gen 8 is essentially "Huh? No, retro data buses are like ISA, PCI, and AGP, right? PCIe Gen 3 and SATA are still pretty new...".

I wonder what modulation order / RF bandwidth they'll be using on the PHY for Gen8. I think Gen7 used 32GHz, which is ridiculously high.

Dylan16807

> PCIe Gen 3 and SATA are still pretty new...

That's an interesting thought to look at. PCIe 3 was a while ago, but SATA was nearly a decade before that.

> I wonder what modulation order / RF bandwidth they'll be using on the PHY for Gen8. I think Gen7 used 32GHz, which is ridiculously high.

Wikipedia says it's planned to be PAM4 just like 6 and 7.

Gen 5 and 6 were 32 gigabaud. If 8 is PAM4 it'll be 128 gigabaud...

eqvinox

I'd highly advise against using GHz here (without further context, at least), a 32Gbaud / 32Gsym/s NRZ signal toggling at full rate is only a 16GHz square wave.

baud seems out of fashion, sym/s is pretty clear & unambiguous.

(And if you're talking channel bandwidth, that needs clarification)

kvemkon

> > I think Gen7 used 32GHz, which is ridiculously high.

> 16GHz square wave

Is it for PCIe 5.0? PCIe 6.0 should operate on the same frequency and doubling the bandwidth by using PAM4. If PCIe 7.0 doubled the bandwidth and is still PAM4, what is the underlying frequency?

eqvinox

PCIe 7 = 128 GT/s = 64 Gbaud × PAM-4 = 32 "GHz" (if you alternate extremes on each symbol)

for gen6, halve all numbers

guerrilla

> baud seems out of fashion, sym/s is pretty clear & unambiguous.

Huh? Baud is sym/s.

eqvinox

Yes, that was the implication, but I've been getting the impression that using baud is kinda unpopular compared to using sym/s.

weinzierl

Don't forget VESA Local Bus.

richwater

Meanwhile paying a premium for a Gen5 motherboard may net you somewhere in the realm of 4% improvements in gaming if you're lucky.

Obviously PCI is not just about gaming but...

simoncion

From what I've seen, the faster PCI-E bus is important when you need to shuffle things in and out of VRAM. In a video game, the faster bus reduces the duration of stutters caused by pushing more data into the graphics card.

If you're using a new video card with only 8GB of onboard RAM and are turning on all the heavily-advertised bells and whistles on new games, you're going to be running out of VRAM very, very frequently. The faster bus isn't really important for higher frame rate, it makes the worst-case situations less bad.

I get the impression that many reviewers aren't equipped to do the sort of review that asks questions like "What's the intensity and frequency of the stuttering in the game?" because that's a bit harder than just looking at average, peak, and 90% frame rates. The question "How often do textures load at reduced resolution, or not at all?" probably requires a human in the loop to look at the rendered output to notice those sorts of errors... which is time consuming, attention-demanding work.

Dylan16807

There's a good amount of reviewers showing 1% lows and 0.1% lows, which should capture stuttering pretty well.

I don't know how many games are even capable of using lower resolutions to avoid stutter. I'd be interested in an analysis.

simoncion

> I don't know how many games are even capable of using lower resolutions to avoid stutter.

Some games may be doing that. I expect that in others, the lower-resolution or missing textures are a result of the texture streaming system catastrophically failing to meet its deadline to load in the relevant textures and giving up. It's my understanding that "texture pop-in" is the too-late replacement of a low-resolution "placeholder" texture with a high-resolution texture. If the high-resolution texture doesn't load in time, then you're stuck with the low-res placeholder.

Commentary on textures that fail to load are in the "Monster Hunter: Wilds" section, starting at ~11:35 in [0], and the "Space Marine 2" section starting at ~00:14:40 in [1], which also mentions "Halo: Infinite" and "Forspoken" as other games that have the same sort of behavior. Missing textures are mentioned in the "Star Wars Jedi: Survivor" section starting at 21:21 at [1]. And -while not mentioned- if you look not-that-closely at the first ~5 seconds of that section, you can see the textures (most obviously the ground texture) go from "the same as the 16GB model" to "something you'd expect to see in a bad PS3 game".

Also in that first video, you can see some head-to-head demonstrations of the performance problems having a slower PCI-E link gives you when running out of VRAM starting at ~04:52 in the "The Last of Us Part II" section of [0] and also at ~17:00 in the "F1 25" section of that same video.

I expect there are a few other videos out there that do this sort of analysis, but I can't be arsed to find them.

[0] <https://www.youtube.com/watch?v=MG9mFS7lMzU>

[1] <https://www.youtube.com/watch?v=AdZoa6Gzl6s>

rbanffy

I’m sure Windows performance counters can track the volume of data going between CPU memory and VRAM over the PCIe bus.

checker659

No matter the leaps in bandwidth, the latency remains the same. Also, with PCIe switches used in AI servers, the latency (and jitter) is even pronounced.

jeffbee

By an overwhelming margin, most computers are not in gamers' basements.

LeoPanthera

I thought we were only just up to 5? Did we skip 6 and 7?

pkaye

Some of the newer ones maybe more for data centers.

bpbp-mango

I wonder if this will help applications like VPP/DPDK. not sure if the CPU or the lanes are the bottleneck there.

pshirshov

Yeah, such a shame I've just upgraded to a 7.0 motherboard for my socket AM7 CPU.

Being less sarcastic, I would ask if 6.0 mobos are on the horizon.

wmf

I guess Venice, Diamond Rapids, and Vera will have 6.0.

_zoltan_

GB300 has Gen6 so I am pretty sure GV200 would have it at least.

I'm fairly sure they are cooking Gen7 already into CX9.

top_sigrid

For all the others seeing the HN Hugg of death :)

https://archive.is/oa81K

0xffany

Crazy to see STH get the HN hug of death!

One would think they get sizeable traffic as-is.

iFred

I can't be the only one let down that there wasn't some new slot design. Something with pizzazz and flare.

rfl890

So I can run a full powered PCIe 4.0 16x GPU on 1 PCIe 8.0 lane???

ThatMedicIsASpy

I'll take it if my consumer mb chipset supports giving me 48 PCIe7 lanes if future desktops still would only come with 24 gen 8 lanes