Skip to content(if available)orjump to list(if available)

The ITTAGE indirect branch predictor

burnt-resistor

Some architectures have/had branch hint instructions.

https://arcb.csc.ncsu.edu/~mueller/cluster/ps3/SDK3.0/docs/a...

The impact of a branch miss is a particular pipeline stalls to flush the incorrect prediction. If there were resources available for the other branch to be speculatively executed concurrently and in parallel it might take less wall time.

saagarjha

If the author is around, the final link points to http://localhost:1313/post/cpython-tail-call/#further-weirdn....

jonstewart

I learned about computed goto a dozen years ago, tried it out in my interpreter, and got worse performance in that Haswell era than with a trusty switch statement. Branch predictors have made computed goto less compelling for a good long time.

Tail call is a different matter…

nynx

I must be missing something here. How would this help predict interpreter dispatch? Those won’t be a function of previous branch history or pc, which may very well be independent of the next opcode. They’d be a function of state in memory or registers.

dzaima

If your interpreter is interpreting a program with unpredictable branches, of course no predictor will magically make your interpreter get branches better predicted than an equivalent compiled program will.

The question here is about all other branching the interpreter will do. i.e. even if you have a unpredictable `if (a+b < 0)`, there's still the dispatching to the "load-variable" and "add" and "load-constant" and "less-than" and "do-branch" opcodes, that still will benefit from being predicted, and they could very well if you have it repeated in a loop (despite still having a single unpredictable branch), or potentially even if you just have a common pattern in the language (e.g. comparison opcodes being followed by a branch opcode).

saagarjha

Interpreters are just like normal programs, but splatted out a bit. In particular, they have branches and loops just like normal programs. The challenge for processors is that these high level constructs are far apart and dispatched through an interpreter loop, which obfuscates them. Being able to reach further back in history lets you recover this kind of information "through" the intervening bits.

brigade

In a hot loop, the next opcode can be predicted quite well from the history of previous opcodes executed, especially once have a couple iterations available in your history. And the opcodes executed in an interpreter are generally equivalent to the dispatch branch target.

achierius

"very well may be" but oftentimes isn't. Branch history does in practice do a very good job of predicting what target you're going to take for an indirect branch.

nynx

Sure. I can easily see that often being the case for arbitrary code but not interpreter dispatch loops.