Intel's Battlemage Architecture

152 comments

·February 11, 2025

jorvi

> Unfortunately, today’s midrange cards like the RTX 4060 and RX 7600 only come with 8 GB of VRAM

Just a nit: one step up (RX 7600 XT) comes with 16GB memory, although in clamshell configuration. With the B580 falling inbetween the 7600 and 7600 XT in terms of pricing, it seems a bit unfair to only compare it with the former.

- RX 7600 (8GB) ~€300

- RTX 4060 (8GB) ~€310

- Intel B580 (12GB) ~€330

- RX 7600 XT (16GB) ~€350

- RTX 4060 Ti (8GB) ~€420

- RTX 4060 Ti (16GB) ~€580*

*Apparently this card is really rare plus a bad value proposition, so it is hard to find

qball

All RTX xx60 cards are really bad value propositions, though (especially in comparison to the xx80 series cards).

If the 4060 was the 3080-for-400USD that everyone actually wants, that'd be a different story. Fortunately, its nonexistence is a major contributor to why the B580 can even be a viable GPU for Intel to produce in the first place.

jorvi

Not all of them. The 3060 Ti was great because it was actually built on the same underlying chip as the 3070 and 3070 Ti. Which ironically made those less valuable.

But the release of those cards was during Covid pricing weirdness times. I scored a 3070 Ti at €650, whilst the 3060 Ti's that I actually wanted were being sold for €700+. Viva la Discord bots.

glenneroo

I believe 3060 Ti's were in higher demand because they were a great value for shitcoin mining, especially after tuning (e.g. undervolting).

RachelF

AMD and Intel copy Nvidia's VRAM lead.

The RTX 1060 came with 6GB of VRAM. Four generations later, the 5060 comes with only 2GB more.

I suspect NVidia does not want consumer cards to eat into those lucrative data centre profits?

The cost of 1GB of VRAM is $2.30 see https://www.dramexchange.com

Moto7451

I don’t think it’s accurate to say they’re copying NVidia’s lead. On the mid range it’s been segregated on memory and bus width for a very long time. Your 1060 is a good example actually. The standard GDDR5 versions have a reduced die with six memory controllers vs eight on the 1070 and 1080. The 1060 GDDR5X version a cut down version of the same die as the 1080 and with two memory controllers turned off. The odd sizes of 3 and 6 gigs of memory is due to the way they segmented their chips to have a 192bit bus on the 1060 vs the 256bit bus on the top end. The 5GB version is further chopped down to 160bit.

Those parts competed with the RX480 with 8GB of memory so NVidia was behind AMD at that price point.

AMD had not been competing with the *80/Ti cards at this point for a few generations and stuck with that strategy through today though the results have gotten better SKU to SKU.

And you’re quite right they don’t want these chips in the data center and at some point they didn’t really want these cards competing in games with the top end when placed in SLI (when that was a thing) as they removed the connector from the mid range.

est

the VRAM chip is cheap, but how to inter-connect them at high speed isn't

michaelt

If you want to double the memory and double the total memory bandwidth, sure. That'd need twice as many data lines, or the same lines at twice the speed.

But if you just want to double the memory without increasing the total memory bandwidth, isn't it a good deal simpler? What's 1 more bit on the address bus for a 256 bit bus?

immibis

But having the VRAM allows you to run the model on the GPU at all, doesn't it? A card with 48GB can run twice as much model than a card with 24GB, even though it takes twice as long. Nobody is expecting to run twice as much model in the same time just by increasing the VRAM.

Without the extra VRAM, it takes hundreds of times divided by batch size longer due to swapping, or tens of times longer consistently if you run the rest of the model on the CPU.

clamchowder

(author here) When I checked the 7600 XT was much more expensive. Right now it's still $360 on eBay, vs the B580's $250 MSRP, though yeah I guess it's hard to find the B580 in stock

jorvi

Yeah I guess regional availability really works into it.. bummer

I wonder if the B580 will drop to MSRP at all, or if retailers will just keep it slotted into the greater GPU line-up the way it is now and pocket the extra money.

mananaysiempre

All sources I've seen say the 4060 Ti 8GB is also really bad value. Here’s GamersNexus for example: https://www.youtube.com/watch?v=Y2b0MWGwK_U.

jandrese

And that is also one of the most popular cards on prebuilt systems. Just search through Amazon listings and see which card shows up all the damn time.

kbolino

Prebuilders get priority access and volume discounts, so while it may not be a good value to buy individually, that doesn't necessarily apply to buying it in bulk.

hassleblad23

> Intel takes advantage of this by launching the Arc B580 at $250, undercutting both competitors while offering 12 GB of VRAM.

Not sure where you got that 350 EUR number for B580?

xmodem

330 EUR is roughly reflective of the street price of the B580 in Europe.

For example:

https://www.mindfactory.de/product_info.php/12GB-ASRock-Inte... (~327 EUR)

https://www.overclockers.co.uk/sparkle-intel-arc-b580-guardi... (~330 EUR)

https://www.inet.se/produkt/5414587/acer-arc-b580-12gb-nitro... (~336 EUR)

KronisLV

Can confirm, bought mine for about 350 EUR in Latvia from a store that's known to add a bit of markup on things.

Though the market is particularly bad here, because an RTX 3060 12 GB (not Ti) costs between 310 - 370 EUR and an RX 7600 XT is between 370 - 420 EUR.

Either way, I'm happy that these cards exist because Battlemage is noticeably better than Alchemist in my experience (previously had an A580, now it's my current backup instead of the old RX 570 or RX 580) and it's good to have entry/mid level cards.

AnotherGoodName

On Newegg the cheapest in stock is USD$370 as an example. This is consistent for Intel cards unfortunately.

The reviews will say "decent value at RRP" but Intel cards never ever sell anywhere near RRP meaning that when it comes down to it you're much better off not going Intel.

I feel like reviews should all acknowledge this fact by now. "Would be decent value at RRP but not recommended since Intel cards are always %50 over RRP".

muststopmyths

I bought the Asus version of the B580 at MSRP of $280 on launch day.

Central Computers in the SF Bay Area keeps them at MSRP. They may not be in stock online, but the stores frequently have stock, especially San Mateo.

Not useful to people outside the area, but then Microcenter also sells at MSRP. So there are non-scalping stores out there.

The trick is to jump on the stock when it arrives.

compsciphd

newegg doesn't have any stock of B580s at hte moment, you're looking at 3rd party sellers who are raising prices (and hence why have stock).

https://www.newegg.com/p/pl?d=b580&N=8000 to see sold by newegg stock.

lotharcable2

I have a Rx 7600 XT that I purchased to run Ollama LLMs. Something just to screw around with.

Works fine with their ollama:rocm docker image on Fedora using podman. No complaints.

Did some gaming, too, just to see how well that works. A few steam games.

mrbonner

Let me know where you could find 4060Ti 16GB for under $1000 USD

hedgehog

What's annoying is they were under $500 just a few months ago.

donflamenco

Bestbuy has the PNY 4060 Ti 16GB in stock right now for $450.

qingcharles

Not any longer...!

This card still seems like a bad proposition. It's roughly similar performance to the 11GB 2080 Ti for double the price. You'd have to really want that extra 5GB.

null

[deleted]

netbioserror

A lot of commentators have pointed out that Intel is reaching nowhere near the performance/mm2 of Nvidia or AMD designs, though contrary to what I thought that might imply, it seems that power consumption is very much under control on Battlemage. So it seems the primary trade-off here is on the die cost.

Can anyone explain what might be going on here, especially as it relates to power consumption? I thought (bigger die ^ bigger wires -> more current -> higher consumption).

kimixa

Increasing clocks tends to have a greater-than-linear cost on power, as you need transistors to switch quicker so often need a higher voltage, which causes more leakage and other losses on top of the switching cost itself (that all turn into heat). Higher clock targets also have a cost for the design itself, often needing more transistors for things like extra redrivers to ensure you get fast switching speed, or even things like more pipeline stages. Plus not all area is "transistors" - it's often easier to place related units that need a lot of interconnectivity with shorter interconnects if an adjacent, less interconnected unit isn't also trying to be packed into much of the same space, routing on modern chips is really difficult (and a place where companies can really differentiate by investing more).

For tasks that tend to scale well with increased die area, which is often the case for GPUs as they're already focused on massively parallel tasks so laying down more parallel units is a realistic option, running a larger die at lower clocks is often notably more efficient in terms of performance per unit power.

For GPUs generally that's just part of the pricing and cost balance, a larger lower clocked die would be more efficient, but would that really sell for as much as the same die clocked even higher to get peak results?

netbioserror

>For tasks that tend to scale well with increased die area, which is often the case for GPUs as they're already focused on massively parallel tasks so laying down more parallel units is a realistic option, running a larger die at lower clocks is often notably more efficient in terms of performance per unit power.

I should've considered this, I have an RTX A5000. It's a gigantic GA102 die (3090, 3080) that's underclocked to 230W, putting it at roughly 3070 throughput. That's ~15% less performance than a 3090 for a ~35% power reduction. Absolutely nonlinear savings there. Though some of that may have to do with power savings using GDDR6 over GDDR6X.

(I should mention that relative performance estimates are all over the place, by some metrics the A5000 is ~3070, by others it's ~3080.)

bgnn

Yeah the power consumption scales, to first order, with Vdd^2 (square of power supply voltage) but performance scales with Vdd. Though you cannot simply reduce the Vdd and clock rate and do more pipelining etc to gain back the performance. If you are willing to back off on performance a bit you can gain hugely on power. Plus thermal management of it is more manageable.

cubefox

> Increasing clocks tends to have a greater-than-linear cost on power

Old source, but this says the power cost of increasing the clock frequency is cubic: https://physics.stackexchange.com/questions/34766/how-does-p...

bloomingkales

They are holding back the higher vram models of this card. GPU makers always do some nerfing of their cards in the same product line. Often times there’s no good reason for this other than they found specs that they can market and sell simply by moving voltages around.

Anyway, expecting good earnings throughout the year as they use Battlemage sales to hide the larger concerns about standing up their foundry (great earnings for the initial 12gb cards, and so on for the inevitable 16/24gb cards).

tonetegeatinst

It mainly seems to boil down to design choice and process technology.

They might be targeting a lower power density per squad mm than compared to amd or nvidia, focusing more on lower power levels.

Instruction set architecture and layout of the chips and PCB also factor into this as well.

elric

I couldn't find any information regarding power consumption in the article. I'd love to upgrade my aging gaming rig, but all modern AMD/Nvidia graphics cards consume significantly more power than my current card.

null

[deleted]

MisterTea

> I thought (bigger die ^ bigger wires -> more current -> higher consumption).

I am not a semi expert but bigger die doesn't mean bigger wires if you are referring to cross-section, the wires would be thinner meaning less current. Power is consumed pushing and pulling electrons from the transistor gates which are all of the FET type, field effect transistor. The gate is a capacitor that needs to be charged to open the gate to allow current to flow through the transistor. discharging the gate closes it. That current draw then gets multiplied by a few billion gates so you can see where the load comes from.

williamDafoe

Actually the wires don't scale down like the transistors do. I remember in graduate school taking VLSI circuit complexity theory and the conclusion was for two dimensional circuits the wires will end Moore's Law. However I've seen articles about backside power delivery and they are already using seven+ layers so the wires are going through three dimensions now. Copper interconnects were a one-time bonus in the late 90s and after that wires just don't scale down-signal delay would go up too fast. Imagine taking a city with all the streets and houses and the houses now become the size of dog houses but you can't shrink the streets they have to stay the same size to carry signals quickly!

gruez

>I thought (bigger die ^ bigger wires -> more current -> higher consumption).

All things being equal, a bigger die would result in more power consumption, but the factor you're not considering is the voltage/frequency curve. As you increase the frequency, you also need to up the voltage. However, as you increase voltage, there's diminishing returns to how much you can increase the frequency, so you end up massively increasing power consumption to get minor performance gains.

wmf

If it's a similar number of transistors on a larger die then I can believe the power consumption is good. Less dense layout probably requires less design effort and may reduce hotspots.

If Intel is getting similar performance from more transistors that could be caused by extra control logic from a 16-wide core instead of 32.

p1necone

performance/mm2

This strikes me as not a particularly useful metric, or at least one only indirectly related to the stuff that actually matters.

Performance/watt and performance/cost are the only metrics that really matter both to consumer and producer - performance/die size is only used as a metric because die size generally correlates to both of those. But comparing it between different manufacturers and different fabs strikes me as a mistake (although maybe it's just necessary because identifying actual manufacturing costs isn't possible?).

myrmidon

Loosely related question:

What prevents manufacturers from taking some existing mid/toprange consumer GPU design, and just slapping like 256GB VRAM onto it? (enabling consumers to run big-LLM inference locally).

Would that be useless for some reason? What am I missing?

elabajaba

The amount of memory you can put on a GPU is mainly constrained by the GPU's memory bus width (which is both expensive and power hungry to expand) and the available GDDR chips (generally require 32bits of the bus per chip). We've been using 16Gbit (2GB) chips for awhile, and they're just starting to roll out 24Gbit (3GB) GDDR7 modules, but they're expensive and in limited demand. You also have to account for VRAM being somewhat power hungry (~1.5-2.5w per module under load).

Once you've filled all the slots your only real option is to do a clamshell setup that will double the VRAM capacity by putting chips on the back of the PCB in the same spot as the ones on the front (for timing reasons the traces all have to be the same length). Clamshell designs then need to figure out how to cool those chips on the back (~1.5-2.5w per module depending on speed and if it's GDDR6/6X/7, meaning you could have up to 40w on the back).

Some basic math puts us at 16 modules for a 512 bit bus (only the 5090, have to go back a decade+ to get the last 512bit bus GPU), 12 with 384bit (4090, 7900xtx), or 8 with 256bit (5080, 4080, 7800xt).

A clamshell 5090 with 2GB modules has a max limit of 64GB, or 96GB with (currently expensive and limited) 3GB modules (you'll be able to buy this at some point as the RTX 6000 Blackwell at stupid prices).

HBM can get you higher amounts, but it's extremely expensive to buy (you're competing against H100s, MI300Xs, etc), supply limited (AI hardware companies are buying all of it and want even more), requires a different memory controller (meaning you'll still have to partially redesign the GPU), and requires expensive packaging to assemble it.

lostmsu

What of previous generations of HBM? Older consumer AMD GPUs (Vega) and Titan V had HBM2. According to https://en.wikipedia.org/wiki/Radeon_RX_Vega_series#Radeon_V... you could get 16GB with 1TB/s for $700 at release. It is no longer use in data centers. I'd gladly pay $2800 for 48GB with 4TB/s.

Tuna-Fish

Previous generation of HBM is not any cheaper than the current ones, and it is no longer in production, the lines having shifted to the new stuff.

mppm

Interesting. So a 32-chip GDDR6 clamshell design could pack 64GB VRAM with about 2TB/s on a 1024bit bus, consuming around 100W for the memory subsystem? With current chip prices [1], this would cost just about 200$ (!) for the memory chips, apparently. So theoretically, it should be possible to build fairly powerful AI accelerators in the 300W and < 1000$ range. If one wanted to, that is :)

1. https://dramexchange.com/

devit

I wonder if a multiplexer would be feasible?

Hardware-wise instead of putting the chips on the PCB surface one would mount an 16-gonal arrangement of perpendicular daughterboards, each containing 2-16 GDDR chips where there would be normally one, with external liquid cooling, power delivery and PCIe control connection.

Then each of the daughterboards would feature a multiplexer with a dual-ported SRAM containing a table where for each memory page it would store the chip number to map it to and it would use it to route requests from the GPU, using the second port to change the mapping from the extra PCIe interface.

API-wise, for each resource you would have N overlays and would have a new operation allowing to switch the resource overlay (which would require a custom driver that properly invalidates caches).

This would depend on the GPU supporting the much higher latency of this setup and providing good enough support for cache flushing and invalidation, as well as deterministic mapping from physical addresses to chip addresses, and the ability to manufacture all this in a reasonably affordable fashion.

Tuna-Fish

Not at GDDR speeds.

GPUs use special DRAM that has much higher bandwidth than the DRAM that's used with CPUs. The main reason they can achieve this higher bandwidth at low cost is that the connection between the GPU and the DRAM chip is point-to-point, very short, and very clean. Today, even clamshell memory configuration is not supported by plugging two memory chips into the same bus, it's supported by having the interface in the GDDR chips internally split into two halves, and each chip can either serve requests using both halves at the same time, or using only one half over twice the time.

You are definitely not passing that link through some kind of daughterboard connector, or a flex cable.

nenaoki

>A clamshell 5090 with 2GB modules has a max limit of 64GB

How does "clamshelling" get around the 32-bits per module requirement? Do the two 2GB modules act as one 4GB module when clamshelled?

m4rtink

So I guess we just wait for HBM to get cheaper and better, which should not take too long, given how much money is being pumped into it ?

reginald78

You'd need memory chips with double the memory capacity to slap the extra vram in, at least without altering the memory bus width. And indeed, some third party modded entries like that seem to have shown up: https://www.tomshardware.com/pc-components/gpus/nvidia-gamin...

As far as official products, I think the real reason another commentator mentioned is that they don't want to cannibalize their more powerful card sales. I know I'd be interested in a lower powered card with a lot of vram just to get my foot in the door, that is why I bought a RTX 3060 12GB which is unimpressive for gaming but actually had the second most vram available in that generation. Nvidia seem to have noticed this mistake and later released a crappier 8GB version to replace it.

I think if the market reacted to create a product like this to compete with nvidia they'd pretty quickly release something to fit the need, but as it is they don't have too.

SunlitCat

The 3060 with 12GB was an outlier for it's time of release because the crypto (currency) hype was raging at that moment and scalpers, miners and everyone in between were buying graphic cards left and right! Hard times were these! D:

Animats

There are companies in China doing that, recycling older NVidia GPUs.[1]

[1] https://www.reddit.com/r/hardware/comments/182nmmy/special_c...

protimewaster

You can actually getting GPUs from the Chinese markets (e.g., AliExpress) that have had their VRAM upgraded. Someone out there is doing aftermarket VRAM upgrades on cards to make them more usable for GPGPU tasks.

Which also answers your question: The manufacturers aren't doing it because they're assholes.

nenaoki

These are a bit mythical, finding one for sale is no small feat.

I guess adding memory to some cards is a matter of completely reworking the PCB, not just swapping DRAM chips. From what I can find it has been done, both chip swaps and PCB reworks, it's just not easy to buy.

Software support is of course another consideration.

null

[deleted]

ksec

Bandwidth. GDDR / HBM, both used by GPU depending on usage are high bandwidth low capacity, comparatively speaking. Modern GPU tries to put more VRAM with more memory channel up to 512bit but requires more die space and hence are expensive.

We will need a new memory design for both GDDR and HBM. And I wont be surprised they are working on it already. But hardware takes time so it will be few more years down the road.

patmorgan23

Because then they couldn't sell you the $10k enterprise GPU

RachelF

True, but it is mostly profit - GDDR6 sells for $2.30 a gigabyte [1]

[1]https://www.dramexchange.com

Tuna-Fish

That's for 8Gbit chips, which are more or less unusable in modern products. 16Gbit chips are at ~$8, or $4 per GB.

Culonavirus

10? Try 30+ ...

The_Colonel

> enabling consumers to run big-LLM inference locally

A non-technical reason is that the market of people wanting to run their personal LLMs at home is very small.

numpad0

Not sure where I read this and am paraphrasing a lot, but: there's a point where `RAM bandwidth < processor speed` becomes `true`, and processor becomes architecturally data starved.

As in, a 32bit CPU that runs at 1 giga instruction/second, with a 16 Gbps memory bus, could get up to 0.5 instruction per clock, and that's not very useful. For this reason there can't be an absolute potato with gigantic RAM.

How gigantic is not useful, idk.

singhrac

There's some rumors of an Arc Pro, which would be a B580 in clamshell configuration with 24 GB of VRAM (which iiuc would be the same memory bandwidth unfortunately). Unless the price is absurd it would be the cheapest dollar/VRAM card at 24 GB.

mrandish

That would be nice for AI-curious users and hobby experimenters, however gamers won't find near-term value in VRAM beyond 16GB. My concern is that due to Intel's severe financial challenges, their CEO du jour will end up killing off the discrete GPU business.

Back when they (finally) got into dGPUs seriously, Intel (and everyone else) said it would take many years and the patience to tolerate break-even products and losses while coming up the learning curve. Currently, it looks pretty much impossible to sustain ongoing profitability in low-end GPUs. Given gamer's current performance expectations vs the manufacturing costs to hit those targets, mid-range GPUs ($500-$750) seem like the minimum to have broad enough appeal to be sustainably profitable. Unfortunately, Intel is still probably years away from a competitive mid-range product. Sadly, the market has evolved weirdly such that there's now a minimum performance threshold preventing scaling price/performance down linearly below $300. The problem for Intel is they waited too long to enter the dGPU race, so now this profit gap coincides with no longer having the excess CPU profits to go in the red for years. Instead they squandered billions doing stupid stuff like buying McAfee.

treve

I wonder if these GPUs are good options for Linux rigs and if first-party drivers are made.

mtlmtlmtlmtl

Been running Linux on the A770 for about 2 years now. Very happy with the driver situation. Was a bit rough very early on, but it's nice and stable now. Recommend at least Linux 6.4, but preferably newer. I use a rolling release distro(Artix) to get up to date kernels.

ML stuff can be a pain sometimes because support in pytorch and various other libraries is not as prioritised as CUDA. But I've been able to get llama.cpp working via ollama, which has experimental intel gpu support. Worked fine when I tested it, though I haven't actually used it very much, so don't quote me on it.

For image gen, your best bet is to use sdnext(https://github.com/vladmandic/sdnext), which supports Intel on linux officially, and will automagically install the right pytorch version, and do a bunch of trickery to get libraries that insist on CUDA to work in many of the cases. Though some things are still unsupported due to various libraries still not supporting intel on Linux. Some types of quantization are unavailable for instance. But at least if you have the A770, quantisation for image gen is not as important due to plentyful VRAM, unless you're trying to use the flux models.

immibis

I also have an A770. Don't use it for AI, but it runs fine for general 3D use (which mostly means either Minecraft, other similarly casual games, or demoscene shaders). I'm pretty sure I'm not utilizing it fully most of the time.

My main complaint is that the fan control just doesn't work. They stay at low speed or off no matter how hot the card gets, until it shuts down due to overheating. Apparently there's a firmware update to fix this, but you need Windows to flash it. You can zip-tie a spare fan somewhere pointing at the card...

Secondary complaint is that it's somehow not compatible with Linux's early boot console, so there's no graphical output until the driver is loaded. You'd better have ssh enabled while setting it up.

It's also incompatible with MBR/BIOS boot since it doesn't include an option ROM or whatever is needed to make that work - so I switched to UEFI (which I thought I was already using).

When I ran a shader "competition" some people's code with undefined behaviour ran differently on my GPU than theirs. That's unavoidable regardless of brand and not an Intel thing at all.

bradfa

Yes, first party drivers are made. Upstream Linux and mesa project should have good support in their latest releases. If you're running a non-bleeding edge distro, you may need to wait or do a little leg work to get the newer versions of things, but this is not unusual for new hardware.

If you're running Ubuntu, Intel has some exact steps you can follow: https://dgpu-docs.intel.com/driver/client/overview.html

dingi

In fact, Intel has been a stellar contributor to the Linux kernel and associated projects, compared to all other vendors. They usually have launch day Linux support provided that you are running a bleeding edge Linux kernel.

baq

Of all the god awful Linux GPU drivers Intel's are the least awful IME. Unless you're talking purely compute, then nvidia, have fun matching those cuda versions though...

dralley

AMD's Linux drivers are pretty good. I get better performance playing games through Proton on Linux than I do playing the same games on Windows, despite whatever overhead the translation adds.

The only really annoying bug I've run into is the one where the system locks up if you go to sleep with more used swap space than free memory, but that one just got fixed.

ZeWaka

I use an Alchemist series A380 on my nix media server, but it's absolutely fantastic for video encoding.

VTimofeenko

Same; recently built SFF with low profile A310. Goes through video streams like hot knife through butter.

Do you have your config posted somewhere? I'd be interested to compare notes

ZeWaka

Not particularly, no. I mainly followed the Jellyfin setup (https://jellyfin.org/docs/general/administration/hardware-ac...) and tweaked some things like fan speeds to my preferences. General linuxserver.io containers, and ITX.

bee_rider

I have always associated Intel iGPUs with good drivers but people seem to often complain about their Linux dGPU drivers in these threads. I hope it is just an issue of them trying to break into a new field, rather than a slipping of their GPU drivers in general…

jorvi

Intel switched over to a new driver for dGPUs and any iGPU newer than Skylake(?).

The newest beta-ish driver is Xe, the main driver is Intel HD, and the old driver is i915.

People complaining experienced the teething issues of early Xe builds.

sirn

i915 is still the main kernel mode driver on Linux for every Intel GPUs up to Alchemist. xe kmd is used by Battlemage by default (as of 6.12).

There's a Mesa DRI driver, called i965 (originally made for Broadwater chipset, thus the 965 numbering), which has since been replaced by either:

- Crocus for anything up to Broadwell (Gen 8)

- Iris for anything from Broadwell and newer

Then there's a Video Acceleration driver, which is (also) called i965. I think this is what you're referring to. There are:

- i965 (aka Intel VAAPI Driver), which supports anything from Westmere (Gen 5) to Coffee Lake (Gen 9.5)

- iHD (aka Intel Media Driver), is a newer one, which supports anything from Broadwell (Gen 8)

- libvpl, an even newer one, which supports anything from Tiger Lake (Gen 12) and up

Battlemage users had to use libvpl until recently because Media Driver 2024Q4 with BMG support was only released 2 weeks ago. Using libvpl with ffmpeg may requires rebuilding ffmpeg, as some distro doesn't have it enabled (due to conflict with legacy Intel Media SDK, so you have to choose).

I have B580 for my Linux machine (6.12), and xe seems pretty stable/performant so far.

elabajaba

Intel GPU drivers have always been terrible. There's so many features that are just broken if you try to actually use them, on top of just generally being extremely slow.

Hell, the B580 is CPU bottlenecked on everything that isn't a 7800x3d or 9800x3d which is insane for a low-midrange GPU.

everfrustrated

Intel also have up-streamed their video encoding acceleration support into software like ffmpeg.

Intel Arc gpus also support hardware video encoding for the AV1 codec which even the just released Nvidia 50 series still doesn't support.

lostmsu

This is wrong. AV1 encoding is supported since Nvidia 40 series.

jcarrano

Last year I was doing a livestream for a band. The NVidia encoder on my friend's computer (running Windows) just wouldn't work. We tried in vain to install drivers and random stuff from Nvidia. I pulled out my own machine with Linux and Intel iGPU and not only did it worked flawlessly, but did so on battery and with charge to spare.

On the other hand, I have to keep the driver for the secondary GPU (also intel) blacklisted because last time I tried to use it it was constantly drawing power.

daneel_w

Missing detail: 190 watt TDP.

glitchc

Double the memory for double the price and I would buy one in a heartbeat.

kevincox

There are a tons of products that I would buy if I could double a single spec for the same price.

talldayo

If your application is video transcoding or AI inference, you could probably buy two and use them in a multi-GPU configuration.

glitchc

Hate futzing around with multi-GPU configurations. It's always a bit of a mess from a driver perspective, not to mention all the extra power connectors needed, even though this card only requires two.

MezzoDelCammin

This would have been a great card for a homelab if only they haven't decided to move away from SR-IOV in their consumer GPUs.

AFAIK it used to be possible to get some SR-IOV working on the previous Alchemists (with some flashing), but Battlemage seems like a proof of Intel abandoning the virtualization/GPU splitting in the consumer space altogether.

taurknaut

I don't really care about how it performs so long as it's better than a CPU. I just want to target the GPU myself and remove the vendor from the software equation. Nvidia has taught me there isn't any value that can't be destroyed with sufficiently bad drivers.

rajnathani

Slightly tangential: Nvidia has over 22.5K patents: https://patents.google.com/?assignee=Nvidia+Corporation

cedws

I really don’t see the argument for patents. It just slows down healthy competition in Western countries while China disregards them and surges ahead. How can we expect to compete when they don’t play by the same rules?

joelthelion

That's cool and all but can you use it for deep learning?

coderenegade

You can. You need a recent Linux kernel, but pytorch now officially supports Intel's extensions (xpu). These are actually a decent consumer proposition because the bottleneck for most people training models on their own hardware is VRAM. These have substantially more VRAM than anything in their price bracket, and are priced competitively enough that you could buy two and have a pretty solid training setup. Or one for training and one for inference.

joelthelion

Very interesting, thank you!

SG-

it's a nice technical article but the charts are just terrible and seem blurry even when zoomed in.

clamchowder

Yea Wordpress was a terrible platform and Substack is also a terrible platform. I don't know why every platform wants to take a simple uploaded PNG and apply TAA to it. And don't get me started on how Substack has no native table support, when HTML had it since prehistoric times.

If I had more time I'd roll my own site with basic HTML/CSS. It's not even hard, just time consuming.

dark__paladin

TAA is temporal anti-aliasing, correct? There is no time dimension here, isn't it just compression + bilinear filtering?

clamchowder

It was a joke about blurriness. To extend the joke, be glad it doesn't flicker and shimmer.

But yes, platforms usually apply compression in terrible ways, and it's especially noticeable coming from text and straight line stuff like graphs

null

[deleted]

singhrac

Ghost as an alternative? They’ll let you sign up paying subscribers as well.

HN

Intel's Battlemage Architecture

Intel's Battlemage Architecture