Skip to content(if available)orjump to list(if available)

Intel's Battlemage Architecture

Intel's Battlemage Architecture

53 comments

·February 11, 2025

jorvi

> Unfortunately, today’s midrange cards like the RTX 4060 and RX 7600 only come with 8 GB of VRAM

Just a nit: one step up (RX 7600 XT) comes with 16GB memory, although in clamshell configuration. With the B580 falling inbetween the 7600 and 7600 XT in terms of pricing, it seems a bit unfair to only compare it with the former.

- RX 7600 (8GB) ~€300

- RTX 4060 (8GB) ~€310

- Intel B580 (12GB) ~€330

- RX 7600 XT (16GB) ~€350

- RTX 4060 Ti (8GB) ~€420

- RTX 4060 Ti (16GB) ~€580*

*Apparently this card is really rare plus a bad value proposition, so it is hard to find

mananaysiempre

All sources I've seen say the 4060 Ti 8GB is also really bad value. Here’s GamersNexus for example: https://www.youtube.com/watch?v=Y2b0MWGwK_U.

hassleblad23

> Intel takes advantage of this by launching the Arc B580 at $250, undercutting both competitors while offering 12 GB of VRAM.

Not sure where you got that 350 EUR number for B580?

treve

I wonder if these GPUs are good options for Linux rigs and if first-party drivers are made.

mtlmtlmtlmtl

Been running Linux on the A770 for about 2 years now. Very happy with the driver situation. Was a bit rough very early on, but it's nice and stable now. Recommend at least Linux 6.4, but preferably newer. I use a rolling release distro(Artix) to get up to date kernels.

ML stuff can be a pain sometimes because support in pytorch and various other libraries is not as prioritised as CUDA. But I've been able to get llama.cpp working via ollama, which has experimental intel gpu support. Worked fine when I tested it, though I haven't actually used it very much, so don't quote me on it.

For image gen, your best bet is to use sdnext(https://github.com/vladmandic/sdnext), which supports Intel on linux officially, and will automagically install the right pytorch version, and do a bunch of trickery to get libraries that insist on CUDA to work in many of the cases. Though some things are still unsupported due to various libraries still not supporting intel on Linux. Some types of quantization are unavailable for instance. But at least if you have the A770, quantisation for image gen is not as important due to plentyful VRAM, unless you're trying to use the flux models.

bradfa

Yes, first party drivers are made. Upstream Linux and mesa project should have good support in their latest releases. If you're running a non-bleeding edge distro, you may need to wait or do a little leg work to get the newer versions of things, but this is not unusual for new hardware.

If you're running Ubuntu, Intel has some exact steps you can follow: https://dgpu-docs.intel.com/driver/client/overview.html

everfrustrated

Intel also have up-streamed their video encoding acceleration support into software like ffmpeg.

Intel Arc gpus also support hardware video encoding for the AV1 codec which even the just released Nvidia 50 series still doesn't support.

baq

Of all the god awful Linux GPU drivers Intel's are the least awful IME. Unless you're talking purely compute, then nvidia, have fun matching those cuda versions though...

bee_rider

I have always associated Intel iGPUs with good drivers but people seem to often complain about their Linux dGPU drivers in these threads. I hope it is just an issue of them trying to break into a new field, rather than a slipping of their GPU drivers in general…

ThaDood

Here are some benchmarks from a few months back. Seems promising. https://www.phoronix.com/review/intel-arc-b580-gpu-compute

Whoops - included the wrong link! https://www.phoronix.com/review/intel-arc-b580-graphics-linu...

ZeWaka

I use an Alchemist series A380 on my nix media server, but it's absolutely fantastic for video encoding.

VTimofeenko

Same; recently built SFF with low profile A310. Goes through video streams like hot knife through butter.

Do you have your config posted somewhere? I'd be interested to compare notes

dingi

In fact, Intel has been a stellar contributor to the Linux kernel and associated projects, compared to all other vendors. They usually have launch day Linux support provided that you are running a bleeding edge Linux kernel.

myrmidon

Loosely related question:

What prevents manufacturers from taking some existing mid/toprange consumer GPU design, and just slapping like 256GB VRAM onto it? (enabling consumers to run big-LLM inference locally).

Would that be useless for some reason? What am I missing?

elabajaba

The amount of memory you can put on a GPU is mainly constrained by the GPU's memory bus width (which is both expensive and power hungry to expand) and the available GDDR chips (generally require 32bits of the bus per chip). We've been using 16Gbit (2GB) chips for awhile, and they're just starting to roll out 24Gbit (3GB) GDDR7 modules, but they're expensive and in limited demand. You also have to account for VRAM being somewhat power hungry (~1.5-2.5w per module under load).

Once you've filled all the slots your only real option is to do a clamshell setup that will double the VRAM capacity by putting chips on the back of the PCB in the same spot as the ones on the front (for timing reasons the traces all have to be the same length). Clamshell designs then need to figure out how to cool those chips on the back (~1.5-2.5w per module depending on speed and if it's GDDR6/6X/7, meaning you could have up to 40w on the back).

Some basic math puts us at 16 modules for a 512 bit bus (only the 5090, have to go back a decade+ to get the last 512bit bus GPU), 12 with 384bit (4090, 7900xtx), or 8 with 256bit (5080, 4080, 7800xt).

A clamshell 5090 with 2GB modules has a max limit of 64GB, or 96GB with (currently expensive and limited) 3GB modules (you'll be able to buy this at some point as the RTX 6000 Blackwell at stupid prices).

HBM can get you higher amounts, but it's extremely expensive to buy (you're competing against H100s, MI300Xs, etc), supply limited (AI hardware companies are buying all of it and want even more), requires a different memory controller (meaning you'll still have to partially redesign the GPU), and requires expensive packaging to assemble it.

reginald78

You'd need memory chips with double the memory capacity to slap the extra vram in, at least without altering the memory bus width. And indeed, some third party modded entries like that seem to have shown up: https://www.tomshardware.com/pc-components/gpus/nvidia-gamin...

As far as official products, I think the real reason another commentator mentioned is that they don't want to cannibalize their more powerful card sales. I know I'd be interested in a lower powered card with a lot of vram just to get my foot in the door, that is why I bought a RTX 3060 12GB which is unimpressive for gaming but actually had the second most vram available in that generation. Nvidia seem to have noticed this mistake and later released a crappier 8GB version to replace it.

I think if the market reacted to create a product like this to compete with nvidia they'd pretty quickly release something to fit the need, but as it is they don't have too.

ksec

Bandwidth. GDDR / HBM, both used by GPU depending on usage are high bandwidth low capacity, comparatively speaking. Modern GPU tries to put more VRAM with more memory channel up to 512bit but requires more die space and hence are expensive.

We will need a new memory design for both GDDR and HBM. And I wont be surprised they are working on it already. But hardware takes time so it will be few more years down the road.

patmorgan23

Because then they couldn't sell you the $10k enterprise GPU

protimewaster

You can actually getting GPUs from the Chinese markets (e.g., AliExpress) that have had their VRAM upgraded. Someone out there is doing aftermarket VRAM upgrades on cards to make them more usable for GPGPU tasks.

Which also answers your question: The manufacturers aren't doing it because they're assholes.

fulafel

Seems some years away to get that into consumer price range.

glitchc

Double the memory for double the price and I would buy one in a heartbeat.

netbioserror

A lot of commentators have pointed out that Intel is reaching nowhere near the performance/mm2 of Nvidia or AMD designs, though contrary to what I thought that might imply, it seems that power consumption is very much under control on Battlemage. So it seems the primary trade-off here is on the die cost.

Can anyone explain what might be going on here, especially as it relates to power consumption? I thought (bigger die ^ bigger wires -> more current -> higher consumption).

kimixa

Increasing clocks tends to have a greater-than-linear cost on power, as you need transistors to switch quicker so often need a higher voltage, which causes more leakage and other losses on top of the switching cost itself (that all turn into heat). Higher clock targets also have a cost for the design itself, often needing more transistors for things like extra redrivers to ensure you get fast switching speed, or even things like more pipeline stages. Plus not all area is "transistors" - it's often easier to place related units that need a lot of interconnectivity with shorter interconnects if an adjacent, less interconnected unit isn't also trying to be packed into much of the same space, routing on modern chips is really difficult (and a place where companies can really differentiate by investing more).

For tasks that tend to scale well with increased die area, which is often the case for GPUs as they're already focused on massively parallel tasks so laying down more parallel units is a realistic option, running a larger die at lower clocks is often notably more efficient in terms of performance per unit power.

For GPUs generally that's just part of the pricing and cost balance, a larger lower clocked die would be more efficient, but would that really sell for as much as the same die clocked even higher to get peak results?

netbioserror

>For tasks that tend to scale well with increased die area, which is often the case for GPUs as they're already focused on massively parallel tasks so laying down more parallel units is a realistic option, running a larger die at lower clocks is often notably more efficient in terms of performance per unit power.

I should've considered this, I have an RTX A5000. It's a gigantic GA102 die (3090, 3080) that's underclocked to 230W, putting it at roughly 3070 throughput. That's ~15% less performance than a 3090 for a ~35% power reduction. Absolutely nonlinear savings there. Though some of that may have to do with power savings using GDDR6 over GDDR6X.

(I should mention that relative performance estimates are all over the place, by some metrics the A5000 is ~3070, by others it's ~3080.)

gruez

>I thought (bigger die ^ bigger wires -> more current -> higher consumption).

All things being equal, a bigger die would result in more power consumption, but the factor you're not considering is the voltage/frequency curve. As you increase the frequency, you also need to up the voltage. However, as you increase voltage, there's diminishing returns to how much you can increase the frequency, so you end up massively increasing power consumption to get minor performance gains.

wmf

If it's a similar number of transistors on a larger die then I can believe the power consumption is good. Less dense layout probably requires less design effort and may reduce hotspots.

If Intel is getting similar performance from more transistors that could be caused by extra control logic from a 16-wide core instead of 32.

tonetegeatinst

It mainly seems to boil down to design choice and process technology.

They might be targeting a lower power density per squad mm than compared to amd or nvidia, focusing more on lower power levels.

Instruction set architecture and layout of the chips and PCB also factor into this as well.

bloomingkales

They are holding back the higher vram models of this card. GPU makers always do some nerfing of their cards in the same product line. Often times there’s no good reason for this other than they found specs that they can market and sell simply by moving voltages around.

Anyway, expecting good earnings throughout the year as they use Battlemage sales to hide the larger concerns about standing up their foundry (great earnings for the initial 12gb cards, and so on for the inevitable 16/24gb cards).

MisterTea

> I thought (bigger die ^ bigger wires -> more current -> higher consumption).

I am not a semi expert but bigger die doesn't mean bigger wires if you are referring to cross-section, the wires would be thinner meaning less current. Power is consumed pushing and pulling electrons from the transistor gates which are all of the FET type, field effect transistor. The gate is a capacitor that needs to be charged to open the gate to allow current to flow through the transistor. discharging the gate closes it. That current draw then gets multiplied by a few billion gates so you can see where the load comes from.

williamDafoe

Actually the wires don't scale down like the transistors do. I remember in graduate school taking VLSI circuit complexity theory and the conclusion was for two dimensional circuits the wires will end Moore's Law. However I've seen articles about backside power delivery and they are already using seven+ layers so the wires are going through three dimensions now. Copper interconnects were a one-time bonus in the late 90s and after that wires just don't scale down-signal delay would go up too fast. Imagine taking a city with all the streets and houses and the houses now become the size of dog houses but you can't shrink the streets they have to stay the same size to carry signals quickly!

williamDafoe

BattleMage B580 specs from TechPowerUp match the 4070 almost precisely - same RAM, same bus, same bus speed, same power rating, TSMC GPU at N4 node and nearly identical size (290 vs 272 mm square) - $10 difference, tops.

But it was released TWO YEARS LATER than the 4070 and it performs ONE GENERATION WORSE (4060 performance). 2+2 = 4 Years behind! I am not too impressed with the "chips and cheese ant's view article" as they don't uncover the reason why performance is SO PATHETIC!

A weird thing goes on in the TSMC / GPU business. Intel prepaid for the N4 wafers and the design is so poor it's not profitable to make the GPU and sell it at LESS THAN HALF the 4070 price of $550. Normally a mistake like this would lead to product cancellation but the prepayment is a stranded cost so Intel MUST sell these at a loss to get back a sliver of their wasted TSMC prepayments!

What's even worse is that the a770 was also 4 years behind so Intel is not catching up - not one iota! The A770 was an attempt by Intel to make a 3070 clone and they failed badly - I'll let you look up specs and the timelines to do the comparison on your own ...

wqaatwt

> 4060 performance

That’s really not true though. It’s closer to 4060 Ti and somewhat ahead/behind depending on specific game.

ksec

> I am not too impressed with the "chips and cheese ant's view article" as they don't uncover the reason why performance is SO PATHETIC!

Performance on GPU has always been about Drivers. Chip and Cheese is only here to show the uArch behind it. This isn't even new as we should have learned all about it during Voodoo 3Dfx era. And 9 years have passed since an ( now retired ) Intel Engineers said that they would be completing against Nvidia by 2020 if not 2021. We are now in 2025 and they are not even close. But somehow Raja Koduri was suppose to save them and now gone.

stoatstudios

Is nobody going talk about how the architecture is called "Battlemage?" Is that just normal to GPU enthusiasts?

reginald78

The generations are all fantasy type names in alphabetical order. The first was Alchemist (and the cards were things like A310) and the next is Celestial. Actually when I think about product names for GPUs and CPUs these seem above average in clarity and only slightly dorkier than average. I'm sure they'll get more confusing and nonsensical with time as that seems to be a constant of the universe.

spiffytech

Dorky, alphabetical codenames are a big step up from a bunch of lakes in no obvious order.

PaulHoule

Yeah, with the way Intel has been struggling I thought they should get it out of their system and name one of their chips "Shit Creek."

Workaccount2

Can't wait for Dungeon architecture.

faefox

It sounds cool and has actual personality. What would you prefer, Intel Vision Pro Max? :)

ZeWaka

It's their 2nd generation, the 'B' series. The previous was their 'A' / Alchemist.

> According to Intel, the brand is named after the concept of story arcs found in video games. Each generation of Arc is named after character classes sorted by each letter of the Latin alphabet in ascending order. (https://en.wikipedia.org/wiki/Intel_Arc)

null

[deleted]

tdb7893

It's dorky but there isn't much else to say about it. Personal GPU enthusiasts are almost always video game enthusiasts so it's not really a particularly weird name in context.

high_na_euv

Cool name, easy to remember, aint it?

baq

A codename as good as any. Nvidia has Tesla, Turing etc.

dark-star

A well-known commercial storage vendor gives their system releases codenames from beer brands. We had Becks, Guinnes, Longboard, Voodoo Ranger, and many others. Presumably what the devs drank during that release cycle, or something ;-)

It's fun for the developers and the end-users alike... So no, it's not limited to GPU enthusiasts at all. Everyone likes codenames :)

throw16180339

Are you referring to NetApp?

B1FF_PSUVM

> Everyone likes codenames :)

Except butt-headed astronomers

monocasa

I mean, living people seems like a dick move in general for codenames.

joelthelion

That's cool and all but can you use it for deep learning?

taurknaut

I don't really care about how it performs so long as it's better than a CPU. I just want to target the GPU myself and remove the vendor from the software equation. Nvidia has taught me there isn't any value that can't be destroyed with sufficiently bad drivers.