Compilers

Continuation-Passing Style

Make "what happens next" a value you can pass around

Continuation-passing style (CPS) is a code transformation where every function takes an extra argument — a continuation — that says what to do with the result, making the return point, evaluation order, and control flow explicit data.

  • IntroducedStrachey & Wadsworth, 1974
  • Every call becomesa tail call
  • Functions returnnever (they jump)
  • Transform costO(n) in program size
  • Used bySML/NJ, early Scheme compilers

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

The idea: name the rest of the computation

Look at the expression f(g(x)) + 1. There is a hidden choreography you never wrote down: first evaluate g(x), then feed that to f, then take that result and add one. The order, and the "and then" between each step, lives implicitly in the machine's call stack. Each function call pushes a frame that remembers where to come back to and what to do with the answer.

Continuation-passing style takes that implicit "what to do with the answer" and turns it into an ordinary function argument. Instead of a function returning a value, you hand it a one-argument function — its continuation — and the function calls the continuation with the value it would have returned. The continuation literally is the rest of the program, reified as a closure.

Compare direct style with CPS for a tiny example:

// direct style
function add1(x) { return x + 1; }
add1(square(5));            // 26

// CPS — pass a continuation k that consumes the result
function add1(x, k) { k(x + 1); }
square(5, (s) => add1(s, (r) => report(r)));  // calls report(26)

Notice what happened to control flow. In direct style, the order is implied by nesting and the stack unwinds it for you. In CPS, the order is spelled out: square runs, then calls its continuation, which runs add1, which calls its continuation, which calls report. Nothing is waiting on a stack for a return — every function ends by calling the next thing. The "and then" is now data you can pass, store, copy, or even invoke twice.

The idea was made precise by Christopher Strachey and Christopher Wadsworth in their 1974 technical report "Continuations: A Mathematical Semantics for Handling Full Jumps," written to give a clean denotational meaning to goto. Gordon Plotkin's 1975 paper "Call-by-name, call-by-value and the λ-calculus" proved the CPS transform's key property: it makes a program's evaluation order independent of the host language, so a call-by-value source and a call-by-name source compile to CPS terms that reduce the same way. Guy Steele's 1978 "Rabbit" Scheme compiler turned the trick into a practical compilation strategy — the famous slogan being that function calls are just gotos that pass arguments.

The CPS transform, precisely

The transform is mechanical. Write [[ e ]] k to mean "the CPS of expression e with continuation k." Three rules cover the call-by-value λ-calculus, and every real language adds a handful more:

[[ x ]] k        =  k x                         -- a value: just hand it to k
[[ λx. e ]] k    =  k (λx. λk'. [[ e ]] k')     -- a function gains an extra param k'
[[ e1 e2 ]] k    =  [[ e1 ]] (λf.                -- evaluate the function...
                     [[ e2 ]] (λv.               -- ...then the argument...
                       f v k))                   -- ...then apply, threading k through

Read the application rule out loud: to evaluate e1 e2 and deliver the answer to k, first evaluate e1 giving a function f, then evaluate e2 giving a value v, then call f with the value and the continuation. The nesting fixes the evaluation order on the page — there is no longer any ambiguity about whether e1 or e2 runs first. Swap the two inner clauses and you've changed left-to-right to right-to-left evaluation, explicitly.

Three structural facts follow directly:

  • Every call is a tail call. The last action of any CPS function is to call something — either the user's function or a continuation. No frame waits for a return, because nothing returns.
  • Every intermediate value is named. The continuations bind f and v; there are no nested sub-expressions left. This is why CPS is so close to a compiler's three-address code.
  • Evaluation order is fixed and visible. What was implicit in the source is now syntactic structure in the CPS term.

The transform runs in O(n) time and produces a term O(n) in size, where n is the size of the source — each source node expands to a constant number of CPS nodes. The naïve version above introduces "administrative redexes" (continuations that are immediately applied and could be β-reduced away at compile time); a one-pass CPS transform (Danvy and Filinski, 1992) eliminates them on the fly so the output stays linear and clean.

When CPS earns its keep

  • As a compiler IR for functional languages. CPS makes control flow first-class, so tail calls, exceptions, generators, and call/cc all become ordinary function calls — the back end has one mechanism to optimize instead of many. Standard ML of New Jersey and several early Scheme compilers were built on it.
  • To implement advanced control yourself. Coroutines, green threads, backtracking search, and async I/O are all "save the rest of the computation and resume it later" — which is exactly capturing a continuation. CPS-converting your interpreter gives you these for free.
  • To get tail recursion in a language that lacks it. Trampolining a CPS program runs unbounded recursion in constant stack.
  • To make evaluation order explicit and portable when you're proving things about a program or retargeting it to a machine with a different calling convention.

When not to: don't hand-write CPS as application code. Humans read direct style far better, and modern compilers prefer A-normal form (ANF) or SSA, which keep the benefits without the closure soup. CPS is a back-end tool; async/await exists precisely so you never type a continuation by hand.

CPS vs other intermediate representations

CPSDirect styleA-normal form (ANF)SSAJS callbacks
Control flowExplicit (continuations)Implicit (call stack)Implicit (let-bound, tail position marked)Explicit (basic blocks + jumps)Explicit but ad hoc
Who returns?Nobody — all tail callsCallee returns to callerTail calls return; non-tail bindBlocks fall through / branchCallee invokes callback
Intermediate valuesAll named by continuationsAnonymous, nestedAll let-namedAll in SSA registersPartially named
Exceptions / generatorsFree (extra continuations)Needs language supportNeeds language supportNeeds landing padsHand-rolled error callbacks
Closure / allocation costHigh naïvely; optimized awayLowLowNone (register-based)High (one closure per step)
Used bySML/NJ, Rabbit Scheme, OrbitMost front endsGHC core path, Racket, FlambdaLLVM, GCC, V8, HotSpotNode.js before async/await
Human readabilityPoorExcellentModeratePoor (compiler-only)Poor (callback hell)

The headline is that CPS, ANF, and SSA all encode the same dataflow; they differ in how they name things and how loud the control flow is. Andrew Appel made the case for CPS as a compiler IR in Compiling with Continuations (1992), and later crystallized the equivalence in his note "SSA is functional programming" (1998) — a correspondence Richard Kelsey had spelled out in 1995: a CPS continuation parameter is an SSA φ-node, and a CPS jump is a basic-block branch. Pick CPS when you want control operators to be cheap; pick SSA when you're already register-based.

What CPS actually costs

  • Code size: a constant factor, typically 2–3×. Each source node becomes a small fixed cluster of CPS nodes. With administrative redexes reduced, SML/NJ's CPS IR is roughly the same order of magnitude as the source AST.
  • Naïve runtime allocation: one closure per call. A direct-style loop of 1,000,000 iterations runs in O(1) heap; the naïve CPS version allocates a continuation closure each iteration — about 32–48 bytes apiece on a 64-bit runtime, so ~32–48 MB of short-lived garbage. That's the cost that scares people away from CPS.
  • Optimized runtime: zero overhead. Appel's compiler showed that closure conversion plus a "return-continuation lives on a stack" analysis turns most continuations back into stack frames and most continuation calls into direct jumps. After this, SML/NJ's CPS-compiled code was competitive with stack-based ML compilers of the era — the abstraction is free once the back end is done.
  • Stack: constant, always. Because every call is a tail call, a correct CPS implementation needs zero stack growth regardless of recursion depth — provided the runtime does proper tail-call elimination. A direct-style recursion 100,000 frames deep overflows a typical 1–8 MB stack; the CPS version trampolined runs flat.

JavaScript implementation

Here is a complete worked transform: factorial, in direct style, then in CPS, then made stack-safe with a trampoline. The CPS factorial is naturally tail-recursive even though the direct version is not.

// 1. Direct style — NOT tail recursive: the multiply waits for the recursive call.
function fact(n) {
  if (n === 0) return 1;
  return n * fact(n - 1);
}

// 2. CPS — the multiply moves INTO the continuation, so the call is now in tail position.
function factCPS(n, k) {
  if (n === 0) return k(1);
  return factCPS(n - 1, (r) => k(n * r));
}
factCPS(5, (x) => console.log(x));   // 120

// 3. Trampoline — return a thunk instead of recursing, so the JS stack never grows.
//    Both the recursive call AND the continuation call must return thunks, or the
//    built-up continuation chain still unwinds on the stack when the base case fires.
function factTramp(n, k) {
  if (n === 0) return () => k(1);
  return () => factTramp(n - 1, (r) => () => k(n * r));  // thunks, not direct calls
}
function run(thunk) {
  let t = thunk;
  while (typeof t === 'function') t = t();   // bounce until we get a value
  return t;
}
run(factTramp(100000, (x) => x));    // no stack overflow

Two things to internalize. First, every recursive step in factCPS ends with a call and nothing else — the pending multiplication has been captured inside the continuation (r) => k(n * r), which builds up a chain of closures. Crucially, those closures are still a recursive structure: invoking the outermost one when the base case fires would call the next, and the next, unwinding O(n) deep on the native stack. Second, JavaScript does not reliably eliminate tail calls (only Safari/JavaScriptCore implements the ES2015 requirement), so for deep recursion you wrap CPS in a trampoline — and the trampoline must bounce both the recursive descent and the continuation calls (note both arms of factTramp return thunks) to turn the whole closure chain into an iterative loop.

A famous-problem flavor: CPS gives you escape/early-return for free with a second continuation. Here is a list product that aborts the instant it sees a zero, skipping every remaining multiply.

// productCPS short-circuits on 0 by calling the ESCAPE continuation, not k.
function productCPS(xs, k, escape) {
  if (xs.length === 0) return k(1);
  const [head, ...tail] = xs;
  if (head === 0) return escape(0);          // bail out — the rest never runs
  return productCPS(tail, (r) => k(head * r), escape);
}
productCPS([4, 7, 0, 9, 2],
  (r) => console.log('product', r),           // never reached
  (z) => console.log('short-circuit', z));     // logs: short-circuit 0

Python implementation

Python has no tail-call optimization at all and a default recursion limit near 1,000, so the trampoline isn't optional — it's the only way deep CPS runs. The shape mirrors the JavaScript exactly.

import sys

# Direct style — overflows around n = 1000 (default recursion limit).
def fact(n):
    return 1 if n == 0 else n * fact(n - 1)

# CPS — tail-recursive in shape, but Python still grows the stack without a trampoline.
def fact_cps(n, k):
    if n == 0:
        return k(1)
    return fact_cps(n - 1, lambda r: k(n * r))

# Trampoline — return a zero-arg thunk; bounce in a loop so the C stack stays flat.
# Both the recursive call AND the continuation call must return thunks, or the
# built-up continuation chain still unwinds on the stack when the base case fires.
def fact_tramp(n, k):
    if n == 0:
        return lambda: k(1)
    return lambda: fact_tramp(n - 1, lambda r: (lambda: k(n * r)))

def run(thunk):
    t = thunk
    while callable(t):
        t = t()
    return t

print(run(fact_tramp(100000, lambda x: x)).bit_length())   # completes flat; would otherwise blow the stack

# Two continuations = exceptions. ok() is the success path, fail() is the error path.
def safe_div_cps(a, b, ok, fail):
    if b == 0:
        return fail("division by zero")
    return ok(a / b)

safe_div_cps(10, 0,
             ok=lambda r: print("result", r),
             fail=lambda msg: print("error:", msg))   # error: division by zero

The CPS factorial above will still hit Python's recursion limit because each fact_cps call is a real Python call frame — CPS makes the call a tail call, but CPython doesn't act on that. The trampoline is what cashes in the tail-call property: fact_tramp returns a thunk instead of recursing, and run drives it with a flat while loop.

Variants and close relatives

A-normal form (ANF). Sabry and Felleisen showed in 1993 that you can get CPS's "every intermediate value is named, evaluation order is fixed" property without the continuation closures, by let-binding every non-trivial sub-expression and requiring the operands of a call to be trivial. ANF is direct style with the same dataflow discipline, and it's what GHC, Racket, and OCaml's Flambda actually use because it's easier to read and optimize.

Double-barrelled CPS. Carry two continuations — a success k and a failure kerr — to model exceptions and early exit. Add more "barrels" for generators (a yield continuation) or backtracking (a next-choice continuation).

Delimited continuations. Plain CPS captures the entire rest of the program. shift/reset (Danvy and Filinski, 1990) capture only the slice up to a delimiter, giving composable continuations — the basis of effect handlers in OCaml 5 and modern algebraic-effects libraries.

Defunctionalization. Reynolds' technique replaces the continuation closures with a first-order data structure — a stack of small records, one per continuation shape — plus an apply function. CPS plus defunctionalization is exactly how you derive an explicit-stack abstract machine (a CEK machine) from a recursive interpreter.

Monadic style. The continuation monad Cont r a = (a -> r) -> r is CPS wearing a type. do-notation over Cont, or async/await over promises, is the compiler quietly doing the CPS transform so you don't have to.

Common bugs and edge cases

  • Forgetting to thread the continuation through a branch. Every path through a CPS function must end by calling a continuation. An if whose else branch forgets to call k silently drops the rest of the program — no error, just a computation that stops.
  • Calling the continuation more than once by accident. A continuation is a first-class value; invoking it twice runs the rest of the program twice. That's a feature for backtracking and generators, and a heisenbug everywhere else (doubled side effects, duplicated network calls).
  • Relying on tail-call elimination that isn't there. CPS only saves stack if the runtime actually does TCO. V8 (Node, Chrome) and CPython do not. Without a trampoline, a deep CPS recursion overflows the stack just like direct style — sometimes worse, because the closures are bigger.
  • Administrative-redex blowup. The naïve transform emits continuations that are immediately applied; if you don't β-reduce them at compile time, the IR balloons and downstream passes choke. Use a one-pass (Danvy–Filinski) transform.
  • Variable capture / wrong evaluation order. Hand-CPSing by hand, it's easy to evaluate the argument before the function, or to shadow a continuation variable named k in a nested lambda. The transform's whole point is to fix order — get the nesting wrong and you've silently changed semantics.
  • Treating callbacks as full CPS. JavaScript callbacks CPS only the async boundaries, not every call, so error handling has to be wired manually on each callback — the origin of "callback hell." Promises and async/await restore the missing structure by hiding the continuation again.

Frequently asked questions

What exactly is a continuation?

A continuation is the rest of the computation, reified as a function. At any point in a program there is an implicit answer to "what happens once this expression produces a value?" — CPS makes that answer an explicit one-argument function you can pass around, store, or call more than once.

Why do all CPS calls become tail calls?

In CPS, a function never returns to its caller — it hands its result to the continuation it was given. So the very last thing every function does is a call, and nothing waits on the stack for it to come back. That is the definition of a tail call, which is why a CPS program needs proper tail-call elimination to run in constant stack space.

Is continuation-passing style the same as callback hell?

They share a shape — both thread a "what to do next" function through calls — but they're not the same. CPS is a total, mechanical transform applied to every call by a compiler; JavaScript callbacks are a hand-written, partial version of the same idea, which is exactly why nested callbacks become unreadable and why async/await was invented to hide the CPS again.

How does CPS implement exceptions and early return?

Give each function two continuations instead of one: a success continuation k and a failure continuation kerr. A normal result calls k; an error calls kerr, skipping the rest of the success path entirely. try/catch, generators, coroutines, and Scheme's call/cc all fall out of carrying extra continuations.

Does CPS make programs slower?

Naively yes — every call allocates a closure for its continuation, so a hot loop can allocate once per iteration. Real CPS compilers avoid this: closure conversion plus a return-flag analysis turns most continuations back into a stack and into direct jumps, so optimized CPS code matches direct-style code. CPS is an intermediate representation, not the final machine code.

What's the difference between CPS and SSA form?

They're close cousins. Richard Kelsey's 1995 result, echoed by Appel's "SSA is functional programming" (1998), showed CPS and SSA are essentially the same intermediate representation: a CPS continuation parameter corresponds to an SSA φ-node, and a CPS call corresponds to a basic-block jump. Functional compilers tend to use CPS or ANF; imperative compilers like LLVM use SSA, but the dataflow they capture is equivalent.