Compilers
SSA Form
One name, one definition — the IR that powers modern optimization
Static Single Assignment is a compiler IR where every variable is assigned exactly once. φ-functions merge values at control-flow joins. The form that makes constant propagation, dead code elimination, and almost every modern optimization tractable.
- Defining propertyEvery variable assigned exactly once
- Merge mechanismφ-function (phi) at control-flow joins
- Construction costO(n + e + φ) per Cytron et al. 1991
- Used byLLVM IR, GCC GIMPLE, V8 TurboFan, HotSpot C2, .NET RyuJIT
- EnablesSparse Conditional Constant Propagation, DCE, GVN, scalar replacement
- VariantsMinimal, Pruned, Semi-Pruned, SSI, CPS-equivalent
Interactive visualization
Press play, or step through manually. Watch a simple if-else block get versioned and a φ-function appear at the merge.
How SSA works
Take a tiny piece of code and ask the compiler a basic question — "what's the final value of y?" The code:
// Source — before SSA
x = 1;
if (cond) {
x = 2;
} else {
x = 3;
}
y = x + 10;
The variable x is reassigned three times. Any analysis that wants to reason about y's value has to look at all three definitions and decide which one reaches the use. That's a dataflow problem; you solve it with a fixed-point iteration over the control-flow graph. Slow, awkward, and you have to redo it after every transformation.
SSA collapses the problem. Rewrite the same code so each definition gets a unique name:
// After SSA construction
x_1 = 1;
if (cond) {
x_2 = 2;
} else {
x_3 = 3;
}
x_4 = φ(x_2, x_3); // merge: x_2 if we came from then-branch, x_3 if from else
y_1 = x_4 + 10;
Now every variable has exactly one definition. When the optimizer asks "what defines x_4?" the answer is one φ-instruction, not a graph search. Constant propagation, value numbering, dead code elimination — every dataflow optimization that used to require fixed-point analysis becomes a single sparse walk over the SSA graph.
φ-functions and the merge problem
The φ-function is the price you pay for the one-definition rule. Wherever control flow merges, the variable's value depends on which predecessor block executed. The φ-function encodes that choice. It's not a real machine instruction — it's purely notational. The deconstruction pass (sometimes called "out-of-SSA") replaces each φ with copy instructions inserted on the incoming edges:
// After SSA deconstruction
if (cond) {
x_2 = 2;
x_4 = x_2; // copy inserted on then-edge
} else {
x_3 = 3;
x_4 = x_3; // copy inserted on else-edge
}
y_1 = x_4 + 10;
The register allocator coalesces those copies wherever possible — typically eliminating them entirely if x_2, x_3, and x_4 can share a register.
Where do φ-functions go?
You can't place φ-functions naively — putting one at every merge point for every variable would blow up the IR. The classical algorithm by Cytron, Ferrante, Rosen, Wegman, and Zadeck (1991) places φ-functions at the iterated dominance frontier of each variable's definition sites.
The dominance frontier of a block B is the set of blocks X where B dominates a predecessor of X but does not strictly dominate X itself. In plain terms: dominance frontiers are exactly the points where control flow merges back together after diverging. That's where φ-functions need to go. The iterated dominance frontier extends this: place φ-functions, then treat those φ-functions as new definitions, and recompute. Two or three iterations usually suffice; the algorithm runs in roughly O(|V| + |E| + |φ|) and converges quickly.
What SSA enables
| Optimization | Why SSA makes it easy |
|---|---|
| Constant propagation (SCCP) | One def per variable — store lattice values in a flat table; propagate over use-def edges. Wegman-Zadeck 1991. |
| Dead code elimination | If a variable has no uses, mark it dead. Trivially recursive. |
| Global value numbering | Hash each instruction's opcode + operand SSA names. Identical hash → redundant. |
| Common subexpression elimination | Same as GVN — SSA's single-definition rule means equal-named uses are guaranteed equal values. |
| Loop-invariant code motion | An instruction is invariant iff all its operands are defined outside the loop — easy check in SSA. |
| Scalar replacement of aggregates | Split a struct into one SSA value per field. Each field becomes independently optimizable. |
| Induction variable analysis | SSA's φ-functions at loop headers expose induction variables explicitly. |
| Range analysis | Each definition gets a range; merge ranges at φ-functions. |
Every one of these was harder, slower, and less effective in pre-SSA IRs. LLVM, GCC's GIMPLE, V8's TurboFan, HotSpot's C2 — all production optimizing compilers are built on SSA, and the bulk of their optimization machinery only works because of it.
SSA variants and alternatives
| SSA | SSI | CPS | |
|---|---|---|---|
| What it splits | Variables at merges (φ) | Variables at merges AND branches (φ + π) | Functions into continuations |
| Used by | LLVM, GCC, V8, HotSpot | Research compilers, advanced range analysis | SML/NJ, MLton, some Scheme compilers |
| Merge construct | φ-function | φ-function | Function call to merge continuation |
| Branch construct | Implicit (one definition flows through) | π-function splits on branch condition | Tail call to one of two continuations |
| Range/path analysis | Limited | Strong — every path has unique names | Strong — every continuation is its own scope |
| Implementation complexity | Medium | High | High (closure conversion etc.) |
| Equivalent to | CPS for first-order code | SSA + dual on branches | SSA + named labels (Appel 1998) |
Appel's 1998 paper "SSA is functional programming" proved that SSA and CPS are essentially the same IR with different surface syntax. φ-functions are continuations called with positional arguments. If you understand one, you understand the other.
When SSA matters for you
- Reading LLVM IR. Every value in LLVM IR is in SSA form. Reading
clang -emit-llvm -Soutput requires knowing whatphi i32 [ %a, %then ], [ %b, %else ]means. - Writing a compiler pass. Most LLVM passes consume SSA and produce SSA. Knowing what invariants you can rely on (single definition, dominance) is what makes pass writing tractable.
- Debugging optimizer behavior. When
-O2deletes "your" code, look at the SSA — the optimizer saw values, not statements. Dead store elimination operates on SSA defs that have no uses. - Understanding crash dumps in JITs. V8's TurboFan, HotSpot's C2, .NET's RyuJIT all emit SSA in debug logs. The variable subscripts you see in
--print-opt-codeoutput are SSA versions.
Worked example — SCCP on SSA
Sparse Conditional Constant Propagation in 50 lines of pseudo-code:
// SCCP — Wegman & Zadeck 1991
// Lattice: TOP > constant_k > BOTTOM
// Each SSA variable starts at TOP; updates monotonically downward.
func sccp(ssa):
work_ssa = [] // SSA edges to revisit
work_cfg = [entry] // CFG edges to revisit
executable = {} // which edges have been proven reachable
while work_cfg or work_ssa:
if work_cfg:
edge = work_cfg.pop()
if edge in executable: continue
executable.add(edge)
for instr in edge.dst.instructions:
evaluate(instr)
else:
instr = work_ssa.pop()
evaluate(instr)
func evaluate(instr):
old = instr.lattice
if instr.is_phi():
new = meet(operand.lattice for operand in instr.live_operands)
elif instr.is_branch():
c = instr.cond.lattice
if c == TOP: return // wait
elif c is const(true): work_cfg.add(instr.true_edge)
elif c is const(false): work_cfg.add(instr.false_edge)
else: work_cfg.add(both edges) // BOTTOM
return
else:
new = fold(instr.opcode, [op.lattice for op in instr.operands])
if new != old:
instr.lattice = new
for use in instr.uses: work_ssa.add(use)
Notice — no fixed-point iteration over the whole CFG. The "sparse" in SCCP means we revisit only instructions whose inputs changed, following SSA edges directly. Pre-SSA constant propagation had to iterate over every basic block until stable; SCCP converges in roughly O(|V| + |E|) work because the SSA structure tells it exactly where to look. On real LLVM benchmarks SCCP completes in under 5 ms per function.
LLVM IR example
; C source: int max(int a, int b) { return a > b ? a : b; }
; Clang -O0 emits stack loads/stores; -O1 lifts into SSA:
define i32 @max(i32 %a, i32 %b) {
entry:
%cmp = icmp sgt i32 %a, %b
br i1 %cmp, label %if.then, label %if.else
if.then:
br label %if.end
if.else:
br label %if.end
if.end:
%r = phi i32 [ %a, %if.then ], [ %b, %if.else ]
ret i32 %r
}
The phi instruction is the merge — %r is %a if we arrived from %if.then, %b if from %if.else. After register allocation this becomes a single cmovg on x86-64 — both arms collapsed to a conditional move on one register.
Compile-time cost and space overhead
SSA construction and deconstruction is not free. Measured numbers from the LLVM benchmarks:
- SSA construction cost: 1–3% of total compile time at
-O2. Cytron et al.'s algorithm is near-linear in practice (O(|V| α(|V|)) for dominance frontiers using union-find). - SSA deconstruction cost: ~0.5% of compile time. The bulk is copy coalescing during register allocation.
- IR size blow-up: roughly 1.2–1.5× the pre-SSA IR for typical code. Loops with many live variables can hit 2×.
- Memory: SSA increases peak compiler memory by 15–25% on large translation units (Clang building LLVM itself).
The cost pays for itself many times over because optimizations downstream are 5–20× faster on SSA. A pre-SSA constant propagator might take 50 ms on a single function; SCCP runs in 3 ms on the same function.
Common SSA pitfalls
- Forgetting φ-functions are not real instructions. A φ "executes" along a specific incoming edge; the value depends on which predecessor jumped. Treating it as a regular three-operand instruction breaks register allocation and codegen.
- Critical edges. An edge from a block with multiple successors to a block with multiple predecessors. You cannot put copy instructions on critical edges during SSA deconstruction — you have to split them first by inserting an empty block.
- The lost copy problem. Naive φ deconstruction can introduce wrong values when two φ-functions in the same block have interfering operands. The standard fix is Sreedhar's method or Boissinot's parallel-copy semantics.
- Memory and aliases. SSA describes register-like values. Memory (loads, stores through pointers) is not naturally in SSA; MemorySSA and gated SSA extensions handle it, but most pure-SSA optimizations skip memory ops or use side channels.
- Exception edges. A throwing call has two successors: the normal continuation and the exception handler. Forgetting to treat exception edges as part of the CFG breaks dominance computations.
Frequently asked questions
What problem does SSA solve?
In ordinary code a name like x can refer to many different values — every assignment creates a new one. Optimizations that ask "where does this value come from?" have to walk back through the control-flow graph and consider all possible definitions. SSA gives each definition its own unique name (x_1, x_2, x_3) so a use refers to exactly one definition. That single property — one name, one definition — makes constant propagation, dead code elimination, value numbering, and dozens of other passes simple to implement and fast to run.
What is a φ-function?
A pseudo-instruction inserted at control-flow merge points. φ(x_1, x_2) means "this variable's value is x_1 if we came from the predecessor block where x_1 was defined, or x_2 if we came from the predecessor where x_2 was defined." It's not a real machine instruction; it's a notational trick that lets every variable still have exactly one definition even when control flow joins. The register allocator removes φ-functions during deconstruction by inserting copies on the incoming edges.
Where are φ-functions placed?
At the dominance frontier of each definition. The dominance frontier of block B is the set of blocks where B does not strictly dominate the predecessors but does dominate at least one. The classic Cytron et al. 1991 algorithm places φ-functions for variable v at the iterated dominance frontier of v's definition sites — that's the minimal placement guaranteeing correctness. Modern compilers compute dominance frontiers in O(n α(n)) using union-find.
Does SSA make code slower?
No, because SSA is an internal compiler representation — it's destroyed before code generation. The compiler enters SSA after generating IR, runs optimization passes, then deconstructs SSA by replacing φ-functions with parallel copies on the predecessor edges. The final machine code has no φ-functions and no versioned variables — just registers. SSA's cost is in compile time (the construction and deconstruction passes), not runtime.
What is pruned SSA?
A space-optimized variant where φ-functions are only placed for variables that are actually live at the merge point. Standard "minimal" SSA places a φ wherever the dominance frontier algorithm prescribes, even for dead variables — pruned SSA combines that placement with live-variable analysis to skip φ-functions for non-live variables. Saves memory in large functions. Most production compilers (LLVM, GCC) emit pruned SSA.
How does SSA enable constant propagation?
In SSA, every variable has a single definition, so you can store a lattice value (constant, undef, top) per variable in a flat table and propagate. Sparse Conditional Constant Propagation — Wegman and Zadeck 1991 — uses SSA to find more constants than any pre-SSA dataflow algorithm because it can reason about branches as well as assignments. The algorithm runs in O(|V| + |E|) time on the SSA graph and is the cornerstone of LLVM's -O2 pipeline.
What's the alternative to SSA?
Continuation-passing style (CPS) is the functional-language alternative. Each function takes a continuation parameter representing what to do next; control flow is expressed via function calls. CPS and SSA are theoretically equivalent — Appel showed in 1998 that SSA is essentially CPS with named labels. Static Single Information (SSI) is another extension that adds π-functions on branches so split control flow has split names too — used in advanced range-analysis optimizations.