Reverse engineering the 386 processor's prefetch queue circuitry
20 comments
·May 10, 2025myself248
kens
Well, the Manchester carry chain dates back to 1959. Even the 6502 uses carry skip too increment the PC. As word sizes became larger and transistors became cheaper, implementations became more complex and optimized. And mainframes have been using these tricks forever.
kens
Author here. I hope you're not tired of the 386... Let me know if you have any questions.
sitkack
I'll never tire of any analysis you do. But if you are taking requests, I'd love two chips.
The AMD 29000 series, a RISC chip with many architectural advances that eventually morphed into the K5.
And the Inmos Transputer, a Forth like chip with built in scheduling and networking, designed to be networked together into large systems.
sitkack
At what number of layers is it difficult to reverse engineer a processor from die photos? I would think at some point, functionality would be too obscured to able to understand the internal operation.
Do they ever put a solid metal top layer?
kens
I've been able to handle the Pentium with 3 metal layers. The trick is that I can remove metal layers to see what is underneath, either chemically or with sanding. Shrinking feature size is a bigger problem since an optical microscope only goes down to about 800 nm.
I haven't seen any chips with a solid metal top layer, since that wouldn't be very useful. Some chips have thick power and ground distribution on the top layer, so the top is essentially solid. Secure chips often cover the top layer with a wire that goes back and forth, so the wire will break if you try to get underneath for probing.
anyfoo
Never, the 386 is way too important.
neuroelectron
Ok, now do 486.
kens
I'm not as interested in the 486; I went stright to the Pentium: https://www.righto.com/2025/03/pentium-multiplier-adder-reve...
guerrilla
I totally agree with your methodology. Stick to the classic leaps.
neuroelectron
Fair enough. But why?
siliconunit
very nice analysis! personally I'm a DEC alpha fan.. but I guess that's a too big endeavor.. (or maybe a selected portion?)
kens
So many chips, so little time :-)
lysace
I miss those dramatic performance leaps in the 80s. 10x in 5 years, give or take.
Now we get like 2x in a decade (single core).
rasz
There was no performance improvement clock for clock between 286 and 386 when running contemporary 16 bit code https://www.vogons.org/viewtopic.php?t=46350
shihabkhanbd
[dead]
I remember reading about naive circuits like ripple-carry, where a signal has to propagate across the whole width of a register before it's valid. These seem like they'd only work in systems with very slow clocks relative to the logic itself.
In this writeup, something that jumps out at me is the use of the equality bus, and Manchester carry chain, and I'm sure there are more similar tricks to do things quickly.
When did the transition happen? Or were the shortcuts always used, and the naive implementations exist only in textbooks?