Game Theory

Subgame-Perfect Equilibrium

Selten's refinement of Nash for sequential games — rationality has to hold at every node, not just on the path you happen to walk

A subgame-perfect equilibrium is a strategy profile that induces a Nash equilibrium in every subgame, not only along the equilibrium path. The refinement rules out Nash equilibria propped up by threats no rational player would actually carry out, and in finite games of perfect information it is the unique outcome of backward induction. Reinhard Selten introduced it in 1965 and shared the 1994 Nobel for the idea.

  • IntroducedSelten, 1965
  • Nobel1994 (with Nash, Harsanyi)
  • RefinesNash equilibrium
  • AlgorithmBackward induction
  • Generalised bySequential equilibrium (1982)

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

Why Nash equilibrium isn't enough

Nash's 1951 solution concept is the bedrock of non-cooperative game theory. A strategy profile is a Nash equilibrium if no player can improve their payoff by unilaterally switching strategies, taking everyone else's strategy as given. In simultaneous-move games — the prisoner's dilemma, a sealed-bid auction, choosing which side of the road to drive on — Nash is essentially the only thing you want from a solution concept, and the existence theorem (every finite game has at least one Nash equilibrium in mixed strategies) does a lot of work.

Sequential games are a different animal. Players move in order and observe what has happened so far. A strategy in such a game is not a single action but a complete contingent plan: an instruction for what to do at every decision node the player might ever be called upon to act at, including nodes that the player herself plans never to reach. Nash equilibrium only requires that the prescribed actions be optimal averaged over the equilibrium path — it places no constraint at all on what the strategy says to do at off-path nodes. Reinhard Selten noticed that this leaves room for equilibria that no one would actually play if they had to follow them through.

The motivating example: entry deterrence

The cleanest illustration is the entry-deterrence game. An incumbent monopolist earns 5. An entrant decides whether to enter the market. If the entrant stays out, the incumbent keeps 5 and the entrant gets 0. If the entrant enters, the incumbent then chooses between accommodating (each firm earns 2) and fighting a price war (each firm earns -1).

                    Entrant
                   /        \
                  Out        In
                  |        Incumbent
                  |         /     \
                  |     Accom    Fight
                  |       |        |
              (0, 5)   (2, 2)   (-1, -1)        (Entrant, Incumbent)

There are two strategy profiles that are Nash equilibria:

  1. (In, Accommodate). The entrant enters; the incumbent accommodates. Neither has a profitable deviation: entering yields 2 versus 0; accommodating yields 2 versus -1.
  2. (Out, Fight). The entrant stays out; the incumbent's strategy says, "if you enter, I will fight." This is Nash because, given that the entrant stays out, the incumbent's plan is never tested and there is no payoff difference. And given the incumbent's announced plan to fight, the entrant prefers 0 to -1, so staying out is best.

The second equilibrium is obviously suspect. The "fight" plan is a non-credible threat: if the entrant were actually to walk in, the incumbent — facing a choice between -1 and 2 — would accommodate. Treating the threat as binding is what props up the no-entry outcome, but the threat only works if the entrant believes the incumbent is committed to acting irrationally off-path.

Selten's definition

Selten's 1965 paper proposed that we should require strategies to be optimal at every node of the tree, not just on the equilibrium path. He made this precise via the notion of a subgame. A subgame of an extensive-form game is a subtree that (i) begins at a singleton information set (so the moving player knows where they are), (ii) contains all successors of that node, and (iii) respects information sets (no information set is split by the subgame boundary).

A strategy profile is a subgame-perfect equilibrium if the strategies, when restricted to any subgame, form a Nash equilibrium of that subgame. In the entry-deterrence game, the post-entry decision is itself a subgame — the incumbent moving alone between accommodate and fight. Within that subgame, "fight" is not optimal. So (Out, Fight) fails subgame perfection; only (In, Accommodate) survives.

Backward induction

For finite games of perfect information — every information set is a singleton — subgame perfection has a constructive algorithm. Start at every terminal decision node (a node whose moves lead immediately to leaves). For each, pick the action that maximises the moving player's payoff. Fold that node into a leaf whose payoff is the maximised payoff. Repeat from the new "terminal" nodes, working upward, until you have reached the root. The action you chose at each node, taken together, is the subgame-perfect equilibrium strategy profile.

def backward_induction(node):
    if node.is_leaf():
        return node.payoffs
    best_payoffs = None
    best_action  = None
    for action, child in node.children.items():
        child_payoffs = backward_induction(child)
        if best_payoffs is None \
           or child_payoffs[node.mover] > best_payoffs[node.mover]:
            best_payoffs = child_payoffs
            best_action  = action
    node.spe_action = best_action
    return best_payoffs

The procedure runs in time linear in the size of the tree. When payoffs are generic (no ties) the SPE is unique. Kuhn's theorem (1953) — every finite game of perfect information has a pure-strategy equilibrium obtained by backward induction — predates Selten by more than a decade and is in effect a constructive existence proof for SPE in this class of games.

The Stackelberg leader

The earliest economic application of backward induction is Heinrich von Stackelberg's 1934 model of duopoly leadership. Two firms produce a homogeneous good with constant marginal cost c. Inverse market demand is P(Q) = a − Q. The leader chooses its quantity q_L first; the follower observes q_L and chooses q_F. The follower's reaction function is found by maximising π_F = (a − q_L − q_F − c) q_F, giving

R(q_L) = (a − c − q_L) / 2

The leader substitutes R into its own profit and maximises over q_L:

π_L = (a − q_L − R(q_L) − c) q_L
    = (a − c − q_L) q_L / 2

q_L* = (a − c) / 2
q_F* = (a − c) / 4
Q*   = 3(a − c) / 4

The leader produces twice as much as the follower and earns higher profit than under simultaneous (Cournot) play. The pair (q_L*, R(·)) is subgame-perfect because the follower's reaction function specifies a best response at every q_L the leader could have chosen, not only at the equilibrium quantity. Substituting just the equilibrium response would be a Nash equilibrium — but only the full reaction function is SPE.

The centipede game and the limits of backward induction

Rosenthal's 1981 centipede game throws the gentleness of backward induction into sharp relief. Two players alternate. At each move, the active player chooses Take (end the game with a payoff close to the current pot) or Pass (let the pot grow and hand it to the other player). Continuing always grows the joint pot, but the player who takes earns slightly more than they would by passing and being taken from next round.

Backward induction from the final node: the last mover prefers Take. The penultimate mover, anticipating that, prefers Take. Unwinding all the way to move 1: the unique SPE has player 1 taking immediately, and the pot stays tiny. With even a four-leg centipede, both players walk away with vastly less than they would by passing for a few rounds. Common-knowledge rationality, taken seriously, eats its own children.

The empirical literature is unkind to the prediction. McKelvey and Palfrey (1992) ran four- and six-leg centipedes in the lab. Fewer than 10% of player-1s took on the first move; most pairs passed for at least two rounds. The result is one of the most cited departures from textbook backward induction and has motivated three broad responses:

  • Quantal response equilibrium (QRE). McKelvey and Palfrey themselves proposed a softer rationality concept in which players' actions are noisy best responses, with noise parameterised by a "rationality temperature".
  • Level-k reasoning. Players are modelled as iterating a finite number of steps of strategic thinking, not infinitely many. Empirically, average k is around 1.5 to 2.
  • Reputation / incomplete information. If there is a small probability the opponent is "cooperative", a rational player rationally passes for some rounds to learn the opponent's type — Kreps, Milgrom, Roberts and Wilson's "gang of four" (1982) used this trick on the finitely repeated prisoner's dilemma.

The chain-store paradox

Selten returned to backward induction in 1978 with a result he called the chain-store paradox. A monopolist operates one store in each of N small cities. In each city, a single potential entrant arrives, decides to enter or stay out, and the monopolist (if entry occurs) decides to fight or accommodate. Entry costs the entrant a small amount but is profitable if accommodated; fighting is costly for the monopolist and unprofitable for the entrant. The cities are played in fixed sequence and all moves are public.

Backward induction at city N is the one-shot entry-deterrence game: the monopolist accommodates. With that established, city N − 1's monopolist also accommodates — fighting in N − 1 cannot influence behaviour in N, which is already a foregone conclusion. By induction, the monopolist accommodates in every city. The unique SPE has every entrant entering and the monopolist never fighting.

Selten wrote that he found this prediction unconvincing — that, as a businessman, he would in fact fight early to build a reputation, and that he believed most readers would agree. The paradox is that one of the canonical equilibrium concepts in game theory predicts behaviour at odds with the intuition of its inventor. The resolution emerged in 1982 (the Kreps-Milgrom-Roberts-Wilson "gang of four" papers): if there is a small probability the monopolist is a "crazy" type who always fights, the rational monopolist's optimal strategy in the incomplete-information game is to fight in early cities to keep the entrant uncertain about its type. The reputation effect is large enough that, even with a 1% prior on craziness, fighting in N − k cities is optimal for k roughly logarithmic in N. The chain-store paradox is now seen as motivation for the modern theory of reputation rather than a flaw in SPE.

Imperfect information and the next refinements

SPE has bite only when a game has many proper subgames. A subgame must begin at a singleton information set; in games with simultaneous moves or hidden actions, information sets span multiple nodes and many would-be subgames disappear. In the extreme, a game in which the first move is simultaneous has no proper subgames at all — and SPE collapses to Nash.

Two refinements extend the SPE idea to imperfect-information games:

  • Perfect Bayesian equilibrium (PBE). Augments the strategy profile with a system of beliefs — a probability distribution over the nodes in each information set. Each player at each information set must play a best response given the belief; beliefs on the equilibrium path must be consistent with Bayes' rule applied to the strategies. PBE is the standard tool in signalling games, information design, and most of mechanism design.
  • Sequential equilibrium (Kreps-Wilson 1982). Strengthens PBE by also constraining off-path beliefs. Beliefs are required to be limits of beliefs computed from totally mixed perturbations of the equilibrium strategies (so even probability-zero events have well-defined Bayes-updates). Every sequential equilibrium is a PBE; in finite games, every sequential equilibrium is also subgame-perfect.

Further down the refinement lattice sit trembling-hand perfection (Selten 1975, the precursor to sequential equilibrium), proper equilibrium (Myerson 1978), strategic stability (Kohlberg-Mertens 1986) and forward induction. Each is a tighter sieve; each is justified by a different small-mistakes story about how players might tremble off the prescribed path.

When to use which

Game classTightest sensible refinementReason
Finite, perfect informationSubgame-perfect (= backward induction)Unique with generic payoffs; no information sets to worry about
Sequential with hidden type / moveSequential equilibrium or PBEInformation sets break the supply of proper subgames
Signalling gamesPBE plus an intuitive-criterion testMultiple PBE; need to discipline off-path beliefs about types
Repeated games (folk theorem)SPEFolk theorem uses SPE specifically — Nash is too permissive
Simultaneous, one-shotNash (with trembling-hand if multiplicity bites)No proper subgames; SPE adds nothing over Nash

SPE and the folk theorem

One of the most important uses of SPE is in repeated games. The folk theorem (in its subgame-perfect form, due to Fudenberg-Maskin 1986) says that in an infinitely repeated game with sufficiently patient players, any feasible payoff profile that strictly dominates the minimax payoffs can be sustained as a subgame-perfect equilibrium. This is the formal foundation for cooperative outcomes in long-run business relationships, between nations, and in evolutionary models of altruism.

The "subgame-perfect" qualifier matters. A Nash folk theorem (Friedman 1971) sustains cooperation via grim-trigger punishments — any deviation triggers permanent defection. But playing "permanent defection" is itself a non-credible threat in some specifications. The Fudenberg-Maskin theorem replaces grim trigger with finite, calibrated punishments that are themselves SPE off-path. The punishments are credible because the punishing players are best-responding when called to punish, given that everyone else will return to cooperation after the punishment phase.

Worked example: a three-period bargaining game

Two players, A and B, bargain over splitting a pie of size 1. The game has three periods. In period 1, A proposes a split (x, 1 − x); B accepts or rejects. If rejected, period 2 starts with the pie shrunk by discount factor δ ∈ (0,1) (so the pie is now δ). In period 2, B proposes a split; A accepts or rejects. If rejected, period 3 begins with the pie now δ², and A is the proposer; if B rejects again, the game ends with payoffs (0, 0).

Backward induction:

  • Period 3. A proposes any split; B accepts if their share ≥ 0. A offers B = 0 and keeps δ². Payoff: (δ², 0).
  • Period 2. B proposes. A knows that rejecting yields δ² in period 3, so A accepts any offer giving ≥ δ². B offers A = δ², keeps δ − δ² = δ(1 − δ). Payoff: (δ², δ(1 − δ)).
  • Period 1. A proposes. B knows rejecting yields δ(1 − δ) in period 2 (discounted to today, multiply by δ — but careful, the discount has already been baked in: B's period-2 payoff in today's units is δ × δ(1 − δ) = δ²(1 − δ) if we use today's units; in the standard formulation each player simply maximises the period payoffs and the discount is implicit). Solving the standard Rubinstein three-period model: A offers B = δ(1 − δ), keeps 1 − δ(1 − δ).

In the infinite-horizon Rubinstein bargaining game (Rubinstein 1982), backward induction generalises to a stationary fixed-point argument: the proposer keeps 1/(1 + δ), the responder gets δ/(1 + δ). This is the canonical SPE of bargaining, and the source of nearly all economic intuitions about bargaining power and patience.

Common pitfalls

  • Confusing SPE with Nash off-path. Nash equilibrium does not say a strategy must be optimal at every node — only on the path. The "if you enter I'll fight" plan is fine as a Nash strategy; it is only ruled out by SPE.
  • Using SPE in games with imperfect information. SPE quietly degrades to Nash when there are few proper subgames. In signalling games and most mechanism design, reach for PBE or sequential equilibrium instead.
  • Treating backward induction as a behavioural prediction. The centipede game is the cautionary tale. Backward induction is a normative concept; humans do not, in general, perform unlimited recursion. Quantal response, level-k, and cursed equilibrium are the working alternatives in behavioural game theory.
  • Forgetting that SPE constrains the full strategy, not just the path. When stating an SPE, you must specify the action at every node — including those reached with probability zero. A common gradeable mistake is to write only the equilibrium-path actions, which leaves the off-path strategy unspecified.
  • Backward induction with ties. Genericity is not innocuous. If payoffs are tied at some node, the moving player has multiple best responses, and the upstream computation depends on the tiebreaker. The result is a continuum of SPE rather than a unique one. Real-world payoffs are rarely exactly tied, but textbook puzzles sometimes are.

Where SPE actually gets used

  • Industrial organisation. Entry deterrence, limit pricing, capacity precommitment (Dixit 1980), and the analysis of dynamic oligopoly all hinge on SPE. The Stackelberg model is the workhorse first lecture; capacity-then-quantity Cournot games and Bain-Sylos-Modigliani limit-pricing models are direct extensions.
  • Auction theory. Multi-stage auctions (ascending, descending, multi-unit) are extensive-form games solved by backward induction on the bidding rules. The revenue-equivalence theorem holds in SPE of independent-private-value benchmarks.
  • Bargaining and contracts. Rubinstein bargaining, Stahl finite-horizon bargaining, hold-up models, and the Hart-Moore property-rights theory of the firm — all are dynamic games whose predictions come from SPE / sequential equilibrium.
  • Macroeconomics with strategic agents. Time inconsistency of monetary policy (Kydland-Prescott 1977; Barro-Gordon 1983), sovereign debt crises, fiscal commitment problems — modelled as repeated games where SPE characterises which policies can be credibly sustained.
  • Mechanism design and political economy. Voting models with sequential elimination (single-transferable vote, agenda manipulation), legislative bargaining (Baron-Ferejohn 1989), and the design of dynamic auctions all rely on SPE as the equilibrium concept.

Frequently asked questions

Why isn't Nash equilibrium enough for sequential games?

Nash equilibrium only requires that no player wants to deviate, given the others' strategies. In sequential games, players move in order and observe history, so a strategy is a complete plan covering every node — including nodes that may never be reached on the equilibrium path. Nash places no constraint on off-path behaviour, which lets implausible equilibria survive: a player can threaten an action that would hurt themselves to carry out, and that threat is treated as binding even though, faced with the actual choice, they would rather not follow through. Selten's subgame-perfect refinement closes this loophole by requiring rationality at every node, on or off the path.

What is a non-credible threat?

A non-credible threat is an off-equilibrium action that a player promises to take but would not actually choose if the relevant node were reached. The textbook example is an incumbent monopolist warning a potential entrant, "enter and I will start a price war". Fighting is costly to both firms, so if the entrant actually enters, accommodating is the incumbent's best response. The threat to fight is therefore not credible. A Nash equilibrium can support the "no entry" outcome by treating the threat as binding; subgame-perfect equilibrium does not, because it requires the threat to be optimal in the subgame that begins after entry.

How does backward induction find a subgame-perfect equilibrium?

In a finite game of perfect information, backward induction works from the leaves of the game tree to the root. At each terminal decision node, identify the moving player's best action given the payoffs. Replace the node with that action's payoff and fold the tree back one level. Repeat until you reach the root. The resulting strategy profile — the chosen action at every node — is the unique subgame-perfect equilibrium when payoffs are generic. The procedure is constructive, computes in time linear in the size of the tree, and is the algorithm players are implicitly assumed to run when we apply SPE.

What does the centipede game say about backward induction?

The centipede game (Rosenthal 1981) has two players alternately choosing to "take" a small immediate payoff or "pass" the (slightly enlarged) pot to the other player. Backward induction at the last node says the mover should take. Anticipating that, the prior mover takes too. Unwinding, the unique subgame-perfect equilibrium has player 1 taking immediately on move 1, even though continued passing would leave both vastly better off. Experimental subjects routinely pass for several rounds before defecting — McKelvey and Palfrey (1992) is the canonical study. The empirical gap is one of the most cited limits of common-knowledge rationality.

What is the chain-store paradox?

Selten introduced the chain-store paradox in 1978. A monopolist runs N stores in N cities, facing one potential entrant per city in sequence. In each one-shot stage game, accommodating the entrant is the unique SPE — fighting is costly. Backward induction from the last city says the monopolist accommodates there; by induction, accommodates in every city. The paradox is that this prediction strikes most readers as wrong — surely a chain-store should fight early to build a tough reputation. The resolution is that perfect-information backward induction is fragile to small departures from common knowledge, motivating reputation models with incomplete information about the incumbent's type (Kreps-Wilson, Milgrom-Roberts 1982).

How does Stackelberg use backward induction?

In Stackelberg duopoly, a leader firm chooses its quantity q_L first; the follower observes and best-responds with q_F = R(q_L). Solving by backward induction, the leader substitutes the follower's reaction function into its own profit and maximises. The result is q_L higher than the Cournot quantity and q_F lower, giving the leader higher profit than under simultaneous play — the first-mover advantage. The pair (q_L, R(·)) is subgame-perfect because the follower's choice is optimal at every q_L the leader could have picked, not only at the equilibrium one.

How does subgame perfection extend to imperfect information?

Subgame-perfect equilibrium is defined only on proper subgames — subtrees whose root is a singleton information set. In games with information sets that span multiple nodes (so a player does not know exactly where they are), there can be very few proper subgames, and SPE imposes little discipline off-path. Perfect Bayesian equilibrium requires each player at each information set to play a best response given a belief over the nodes in that set, with beliefs consistent with Bayes' rule on the equilibrium path. Sequential equilibrium, due to David Kreps and Robert Wilson (1982), additionally requires that off-path beliefs are limits of beliefs derived from totally mixed perturbations of the equilibrium strategies — a consistency requirement that rules out arbitrary off-path beliefs.