Skip to content(if available)orjump to list(if available)

A catalog of ways to generate SSA

A catalog of ways to generate SSA

22 comments

·February 11, 2025

pizlonator

Pro tips from an SSA hacker.

- Ignore the SSA conversion algorithms that don’t use dominance frontiers. Also ignore the dominator tree generation algorithms that aren’t Langauer-Trajan. Reason: you’ll want to have a dominator tree (ie immediate dominators) anyway for a bunch of SSA optimizations. So, you’ll want Langauer-Tarjan. And if you have that, then using dominance frontiers to generate SSA is just easier.

- CPS isn’t really anything like SSA. Don’t believe that hype. Any claim in PL of two languages or forms being equivalent is not falsifiable because any two Turing complete languages are going to have some transformation between them, so really these papers are just saying “hey look I invented a translation that we all knew was going to be possible and I like my translation for aesthetic reasons”. Working with CPS is nothing like working with SSA.

- The best way I know of representing Phi in your SSA IR is a Pizlo special. I haven’t written it up except by implementing it (JavaScriptCore uses Pizlo SSA). Here’s the idea. Phi takes no arguments (no block arguments and no value arguments). Each Phi has a shadow variable (in addition to the implicit SSA variable it assigns to). Phi just loads the value out of its shadow variable and assigns it to its implicit variable. Then I have a second instruction called Upsilon that takes two args: an input variable and a Phi. It loads the input variable’s implicit value (just as any SSA use would) and stores it to the Phi’s shadow variable. The result of using this IR is that you have zero coupling between your CFG and the SSA graph. It makes CFG transforms much much easier to write. It also means much less special casing of Phi, in practice.

ebiederm

I guess that depends on what you mean by equivalence between CPS and SSA.

My prefered form is basic blocks with arguments. Where the final jump has parameters it passes to it's destination blocks.

That form has a 1-1 equivalence with SSA and I find it much easier to reason about. Every SSA algorithm I have looked at so far works just fine on the representation. Plus there is none of the annoying transforming into and out of SSA.

If you omit the mucking about with return closures in CPS what is left is basic blocks with arguments.

To see basic blocks with arguments as a form of SSA just see every basic block argument as a phi function, and the callers parameters that feed into that argument as the other end of the sources of the phi function.

Just from your description I think your pizlo special SSA is actually a form of basic blocks with arguments. What I don't see from your description is why Upsilon and phi aren't combined into a single notion (What I would call an incoming block parameter).

pizlonator

Pizlo form is not basic block with arguments, because basic blocks don't have arguments in Pizlo form.

I don't think basic block with arguments is CPS because CPS, in practice, involves return closures.

ebiederm

What is Upsilon? It sounds like something that figures out the value coming into a block, which I would call a basic block argument.

How does Upsilon know which values from other basic blocks it could receive? That information is critical for transformations like constant propagation, and register allocation.

k4st

The Pizlo special approach sounds a bit like converting out of SSA form via compensating `alloca`s in LLVM. E.g. one `alloca` per SSA variable, with a `store` into the `alloca` in the source block, and the `phi` replaced by a `load`.

If this is the case, this is an approach I've taken in the past to unify how LLVM-based taint tracking instrumentation of "normal" `alloca`s and phi nodes works, e.g.: https://github.com/lifting-bits/vmill/blob/master/tools/Tain...

pizlonator

It's not, because the Phi's in Pizlo form introduce a Static Single Use variable, which makes them easy to analyze.

Allocas have no requirement that there's a static single use. They are quite hard to analyze.

fuhsnn

>Phi just loads the value out of its shadow variable and assigns it to its implicit variable.

Isn't this similar to block arguments? These shadow variables would have been block parameters, just unordered and maybe not explicitly collected in a set.

tekknolagi

Please write up your "Pizlo special" on Phi nodes

pizlonator

Sure.

Let's first define a syntax for SSA and some terminology. Here's an example SSA node:

    A = Add(B, C)
In reality, this will be a single object in your in-memory representation, and the names are really addresses of those objects. So, this node has an "implicit variable" called A; it's the variable that is implicitly assigned to when you execute the node. If you then do:

    X = Sub(A, 1)
Then "A" is just a pointer to the Add node, and we're using the implicit variable "A".

Here's an example function:

    int foo(int a, int b)
    {
        int x;
        if (a)
            x = b + 1
        else
            x = b * 2
        return x + 42;
    }
Here's an SSA program with a Phi in Pizlo form:

    root:
        A = GetArgument(0)
        B = GetArgument(1)
        Branch(A, then, else)
    then:
        X1 = Add(B, 1)
        Upsilon(X1, ^X)
        Jump(return)
    else:
        X2 = Mul(B, 2)
        Upsilon(X2, ^X)
        Jump(return)
    return:
        X = Phi()
        R = Add(X, 42)
        Return(R)
In Pizlo form:

- Every SSA node has an implicit variable, as mentioned above.

- Every Phi node has a shadow variable in addition to the implicit variable.

Let's say that given a Phi like "X = Phi()", the implicit variable is called "X", and the shadow variable is called "^X".

Therefore, the semantics of an upsilon like "Upsilon(X1, ^X)" is just "set ^X = X1". And the semantics of a Phi like "X = Phi()" is just "set X = ^X".

In other words, you can think of Upsilon as being a side effect (a store to a shadow variable). And you can think of Phi as being a side effect (a load from a shadow variable). You can model them that way in your effect analysis to block reordering Upsilons and Phis.

But also, the shadow variables of Phis in Pizlo form are "Static Single Use" (SSU) variables. This falls out naturally from the fact that the only syntax for loading a shadow variable is the Phi itself. So you can think of Pizlo form as "SSA-SSU form".

The main benefit of this form is that basic blocks - and all CFG data structures - have zero knowledge about SSA. There are no basic block arguments. There's no requirement that Phis appear at the tops of blocks. In fact, this is a valid program in Pizlo form (albeit suboptimal):

    M = Stuff(...)
    Upsilon(M, ^N)
    N = Phi()
    MoreStuff(N)
Here, there's a Phi in them middle of a basic block, and there's an Upsilon just before it. That's fine. This is important, because it means that you can do CFG transforms that blow away control flow edges without worrying about fixing your Phis.

In any Pizlo-form compiler, you'll want to have a Phi simplification pass, which you can implement either by running Cytron or by running any other SSA converter. The simplest is just to just fixpoint the rule that if you have a Phi that has Upsilons that only use the Phi or exactly one other value, then replace the Phi with that other value.

tekknolagi

Mind if I quote you on this in the post?

null

[deleted]

pbiggar

I'll add my own way of constructing SSA, which is mostly untested but worked fine in my compiler: https://paulbiggar.com/research/fit-2009.pdf

RossBencina

This looks helpful. I would love to find a simple method of generating SSA that can deal with goto statements.

A paper that was suggested to me (not mentioned in the linked blog post) is:

"Practical Improvements to the Construction and Destruction of Static Single Assignment Form", by Preston Briggs, Keith D. Cooper, Timothy J. Harvey, I. Taylor Simpson all at Rice University SOFTWARE—PRACTICE AND EXPERIENCE, VOL. 28(8), 1–28 (July 1998)

rayiner

That one is pretty easy to implement. The Cooper & Torczon book describes the algorithm well and has worked examples. https://shop.elsevier.com/books/engineering-a-compiler/coope...

Note also if you’re dealing with weird control flow you might get oddities in translating out of SSA form: https://www.tjhsst.edu/~rlatimer/papers/sreedharTranslatingO.... Sreedhar’s algorithm for translating out of SSA form is relatively easy to implement.

This is probably useless to you unless you know Common Lisp, but I put together a couple of toy implementations of these algorithms years ago to help myself learn them. There’s a couple of nits the papers gloss over that are noted in comments. You’ll need to compute liveness and dominance frontiers. Examples for those are in the project too. https://github.com/rayiner/portfolio/blob/master/ssa_analysi...

https://github.com/rayiner/portfolio/blob/master/ssa_analysi...

barrkel

Gotos are edges in a graph of basic blocks - runs of code with only one entry and one exit. Once you have your code in the form of basic blocks, there's lots of resources. See e.g. https://www.cs.cmu.edu/~fp/courses/15411-f13/lectures/06-ssa...

ufo

I presume their question is about "simple" techniqyes. With goto, the most famous one needs to compute dominance frontiers.

rayiner

Simple as in low time complexity? Computing dominance frontiers is pretty easy: https://www.cs.tufts.edu/~nr/cs257/archive/keith-cooper/dom1...

tekknolagi

What source language are you working with that has go-to?

swiftcoder

Doesn't every language with break-to-label (i.e. break out of multiple nested loops) effectively have goto, from an implementation standpoint?

tekknolagi

Yes but that is not what makes irreducible control flow graphs, I think. You'll still have structured control flow