Notes on the Pentium's microcode circuitry

33 comments

·March 31, 2025

Visit

mmastrac

I'm surprised the microcode ROM and format hasn't been dumped already. Is anyone working on this?

EDIT: The later Atom processors were dumped, are there any similarities?

[1] https://x.com/_markel___/status/1262697756805795841

[2] https://github.com/chip-red-pill/glm-ucode

EDIT 2: Some Pentium Pro disassembly work: https://pbx.sh/pentiumii-part2/

kens

There are some people working on the 386 microcode. Dumping the Pentium microcode ROM from the die photos would be straightforward (but tedious). The hard part is to figure out what all the bits mean.

mmastrac

Any ideas if the mask ROM is scrambled? Apparently the P6 doesn't have a direct mask ROM : microcode relationship.

https://github.com/peterbjornx/p6tools

kens

The Pentium's ROM appears to be slightly scrambled (see footnote 6 in my article). ROMs are often a bit permuted for electrical reasons. For example, instead of columns ordered ABABABAB..., they will be ordered ABBAABBA... and then the A and B select lines can be shared by two columns. But the columns in the Pentium appear to be permuted in an irregular way. I'm not sure if this was for obfuscation or if automated layout software decided this was better.

eigenform

> are there any similarities?

Don't know about the format, but if you look thru old ITJ articles[^1], it seems like the "direct access" interface for reading out different memories exists on older Pentium parts too. Presumably, if it were possible to dump over JTAG, it would be at least a little bit similar to what Peter/Mark have already looked at on newer parts.

[^1: https://www.intel.com/content/dam/www/public/us/en/documents...

kens

Author here for your Pentium questions :-)

mesrik

Hi Ken,

Nice article. While reading I remembered that I watched some time ago the Oral History of Gary Davidian and he was quite bit involved with microcoding. And if I were you I would try asking him if he could be able to give you some ideas where to get more information about microcode workings and development.

Here are links to that interview, if you have time to watch it. It's in two parts.

- https://www.youtube.com/watch?v=l_Go9D1kLNU

- https://www.youtube.com/watch?v=MVEKt_H3FsI

Cheers,

:-) riku

nxobject

This is the Gary Davidian of the Classic MacOS PPC nanokernel, no? I wish I've had as much fun work as he's had in his career.

klelatti

Also the Gary Davidian of Intel vs NEC fame!

https://thechipletter.substack.com/p/intel-vs-nec-the-case-o...

uticus

> If you have enough time, you can extract the bits from the ROM by examining the silicon and seeing where transistors are present.

I'm curious if this is a better way than somehow scanning the ROM electronically? Asking based on my very shallow understanding of how ROM works in this situation, although I did read the bit about M1, M2, and M3 lines/contacts.

[edit: I also read about the testing circuitry, that "runs through each address," but it's unclear if this is an auto feature running without being asked at startup, or if there is some way to tap into / intercept this functionality from outside.]

kens

You could put microprobes on the die and read out the ROM contents electrically, but that would be difficult and would need specialized equipment. Reading out the ROM visually is much easier, and there is software that can interpret images if they are clear enough, e.g. maskromtool: https://github.com/travisgoodspeed/maskromtool

The Pentium's built-in self test is somewhat documented: you pull the INIT pin high while the RESET pin goes low to trigger the test. You can also execute the RUNBIST instruction through boundary scan. I don't think this helps you get the ROM data; the test just reports pass/fail.

vitalmixofntrnt

Can I add my own instruction set extensions to the original x86 isa as implemented by the 8086 without permission from Intel and / or AMD as long as I'm not copying any x86 instruction set extensions?

userbinator

Any patents on the 8086 have long expired, and so have the ones from the last century. As Ken says, the microcode is copyrighted but you don't need to use that to make a compatible version.

kens

I don't know the legal details here but I think you can do whatever you want as long as you're not violating any patents (good luck). Also, Intel claims a copyright on the mnemonics for 8080 and 8086 assembly language. Microcode is also protected by copyright.

dreamcompiler

Isn't the Pentium's microcode upgradable? Or is that only in later chips?

These fixed transistors imply no upgradability.

kens

Microcode updates were first implemented in the Pentium Pro. When the original Pentium had the infamous FDIV bug, the only fix was for Intel to replace the processors at a cost of $475 million.

sroussey

Good thing win95 came out soon after and filled everyone’s coffers from new equipment buying

CalRobert

Why wasn’t the Pentium’s successor the Sexium?

monocasa

Or the Hexium.

The CPU serial number debacle of the 90s would have been even funnier with more overt mark of the beast references.

kens

Ha ha. Internally, the successor to the Pentium (P5) had the codename P6, but it was called the Pentium Pro externally rather than anything six-related.

Instead, Intel decided to go with an incomprehensible system of naming: Pentium Overdrive, Pentium MMX, Pentium Pro, Pentium II, Pentium III, Pentium III Xeon, Pentium D, Pentium M, Pentium Extreme Edition, etc. Good luck trying to figure out the ordering of these processors.

ssl-3

Intel's bad naming is still shooting them in the foot today. For a company that butters their bread by selling new products, they're doing a spectacularly bad job of letting people know what the new hotness is.

I hear things like "What do you mean it's slow? It's an i7!" or "It can't be slow -- it's a Xeon!" from too many people in the wild.

To them, the first number is the important one. What they see is that it is still an i7 and therefore they think it must be still be (relatively) fast, even if their second-gen i7-2600 is demonstrably pretty slow.

I tried once to explain how Intel's numbering system has worked to a friend. I failed pretty miserably. I even used a whiteboard. I couldn't convey what needed to be conveyed in order to explain why his computer (an i7) wasn't keeping up with the tasks he gave to it.

But I can convey the problem simply enough in this crowd, here on HNN: What's faster, a "Core i3-9100" or a "Core i7-2600"?

(At least with 286, 386, 486, and Pentium, the nomenclature was much more digestible.)

chasil

Bob Colwell gave an interview on the Pentium Pro, the first "out of order" Intel x86.

His observations on the Itanium make me gasp.

https://www.sigmicro.org/media/oralhistories/colwell.pdf

https://news.ycombinator.com/item?id=38459128

'I said, wait I am sorry to derail this meeting. But how would you use a simulator if you don't have a compiler? He said, well that's true we don't have a compiler yet, so I hand assembled my simulations. I asked "How did you do thousands of line of code that way?" He said “No, I did 30 lines of code”. Flabbergasted, I said, "You're predicting the entire future of this architecture on 30 lines of hand generated code?" [chuckle], I said it just like that, I did not mean to be insulting but I was just thunderstruck. Andy Grove piped up and said "we are not here right now to reconsider the future of this effort, so let’s move on".'

Colwell is (more formally) the author of The Pentium Chronicles which I plan to read someday.

https://www.amazon.com/Pentium-Chronicles-Robert-P-Colwell/d...

kragen

E5200.

nxobject

I imagine the same reason why we had the Macintosh II and IIx, but not the SE and SEx (instead SE/30)...

dev_john15

Fascinating deep dive into the Pentium microcode ROM circuitry! It's incredible to see the clever tricks Intel used, like the pseudo-random counter, to pack so much logic into such a constrained space. Articles like this give us a rare glimpse into the unsung engineering heroics behind these landmark chips.

Aardwolf

I would love to know how multiplication and division work in modern chips to have such low cycle count compared to addition, since in theory the addition complexity is linear in the amount of bits but multiplication and division are quadratic, or loglinear for large inputs. Part of that is solved by surface area rather than time I guess, but that's also true for the adders already with the carry logic

kens

I'm working on the multiplication circuit in the Pentium; I've done a partial writeup: https://www.righto.com/2025/03/pentium-multiplier-adder-reve... The short answer is that multiplication uses a large tree of adders so it can add up all the long-division terms at once. It also uses base-8 for the multiplier to reduce the number of terms. The adders are 4:2 carry-save compressors that take four numbers as inputs and produce two numbers as outputs.

I also wrote about the Pentium's division circuitry and the infamous FDIV bug: https://www.righto.com/2024/12/this-die-photo-of-pentium-sho... The short answer is that the Pentium used base-4 SRT division, similar to long division but generating two bits of result per cycle. It used a lookup table to determine the two quotient bits; an error in this table resulted in the bug.

RiverCrochet

I remember reading somewhere--memory is hazy--that at least division uses a partial look up table, kinda like how you'd do it in 6502 assembly back in the day. E.g., if you have to multiply something by 5, and you can get the range of inputs down to something reasonable, then you can just have a table of x*5 for that range and just look it up.

Also I'm not sure multiplication/division are quadratic if your algorithm is not "add X to itself Y times." Look at this for 6502 16-bit multiply - https://www.llx.com/Neil/a2/mult.html - it's dependent on the bit width, not the value of the multiplier/cand. Of course this is for integers, not floating point.

devcoder78

[dead]