Theory
Pushdown Automaton
A finite-state machine that finally learned to count
A pushdown automaton is a finite-state machine augmented with a stack, giving it exactly the power to recognize context-free languages — the strings that pure finite automata can never count or match.
- RecognizesContext-free languages
- Memory modelSingle LIFO stack
- Power vs DFAStrictly greater
- Power vs TuringStrictly less
- AcceptanceFinal state or empty stack
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
A finite automaton plus a stack
Hand a finite-state machine the string ((())) and ask whether the parentheses balance. It cannot answer reliably. A finite automaton has a fixed number of states baked in at design time, so it can remember "I have seen at most k open brackets" for some constant k, but never an unbounded count. Feed it more nested brackets than it has states and it forgets. The finite-state machine is provably blind to anything that requires counting without limit.
The pushdown automaton (PDA) fixes this with the smallest possible upgrade: a single stack. The machine still has a finite set of control states, but on each step it may also push a symbol onto the stack or pop the top symbol off. The stack is unbounded, so the machine can now count arbitrarily high — push a marker for every (, pop one for every ), and accept only if the stack is empty at the end.
Formally a PDA is a seven-tuple (Q, Σ, Γ, δ, q₀, Z₀, F): states Q, input alphabet Σ, stack alphabet Γ, a start state q₀, an initial stack symbol Z₀, accepting states F, and a transition function δ. The transition is where the power lives. A move depends on three things — the current state, the next input symbol (or ε, no input at all), and the symbol currently on top of the stack — and produces a new state together with a string of symbols to replace that stack top. Reading the stack top is the new ingredient; a finite automaton's transition sees only state and input.
That single LIFO discipline is also the PDA's hard ceiling. The machine can only ever look at the top of its stack. It cannot peek underneath, cannot index into the middle, and once it pops a symbol that information is gone. This is exactly why a PDA can match two counts (aⁿbⁿ) but not three (aⁿbⁿcⁿ) — matching the bs drains the stack the as built, leaving nothing to check the cs against.
Worked example: matching aⁿbⁿ
The canonical non-regular language is L = { aⁿbⁿ | n ≥ 1 } — some as followed by exactly as many bs. A two-state PDA recognizes it:
State PUSH (reading a's):
(PUSH, a, Z₀) → (PUSH, A Z₀) first a: push A above the bottom marker
(PUSH, a, A) → (PUSH, A A) each later a: push another A
(PUSH, b, A) → (POP, ε) first b: pop one A, switch to POP
State POP (reading b's):
(POP, b, A) → (POP, ε) each b cancels one A
(POP, ε, Z₀) → (ACCEPT, Z₀) saw the bottom marker — counts matched
Walk through aabb. The two as leave the stack as A A Z₀ (top first). The first b pops an A and the second b pops the other; now only Z₀ remains, so the machine accepts. Try aab and one A is still sitting above Z₀ — reject. Try abb and the second b finds Z₀ on top instead of A, with no legal move — reject. The stack is the count.
Determinism: the surprising part
For finite automata, determinism is free: every NFA has an equivalent DFA (subset construction), so nondeterminism buys no extra languages. For pushdown automata this collapse does not happen, and that is the single most surprising fact in the subject.
A nondeterministic PDA (NPDA) may offer several legal moves for the same configuration and accepts if any sequence of choices reaches acceptance. A deterministic PDA (DPDA) permits at most one move per configuration. The NPDA recognizes all context-free languages. The DPDA recognizes only the deterministic context-free languages (DCFLs) — a strictly smaller class.
The witness is the language of even-length palindromes, L = { w wᴿ | w ∈ {a,b}* }. To check it a PDA pushes the first half, then pops while matching the reversed second half — but it has no way to know where the middle is. An NPDA simply guesses the midpoint on every step and one guess works. A DPDA has no guessing, cannot find the center, and provably fails. Palindromes are context-free but not deterministic context-free.
Where the PDA model earns its keep
- Parsing. Every realistic programming language is context-free, and every parser is a PDA. LR parsers push states; recursive-descent parsers use the program's own call stack as the PDA stack.
- Bracket and tag matching. Balanced parentheses, nested HTML/XML tags, and JSON nesting are textbook context-free problems — and the reason a plain regular expression cannot validate them without a recursion extension.
- Proving a language is too hard for regex. If you can show a language needs unbounded counting, you have shown a finite-state machine (and therefore a classic regex) cannot do it — push it up to a PDA.
- Drawing the boundary the other way. The moment you need to match three independent counts, or compare arbitrary substrings, the PDA fails and you must reach for a Turing machine.
PDA in the Chomsky hierarchy
| Finite automaton | Deterministic PDA | Nondeterministic PDA | Linear-bounded automaton | Turing machine | |
|---|---|---|---|---|---|
| Extra memory | none | one stack (LIFO) | one stack (LIFO) | bounded tape | unbounded tape |
| Language class | Regular | Deterministic CF | Context-free | Context-sensitive | Recursively enumerable |
| Chomsky type | Type 3 | (subset of Type 2) | Type 2 | Type 1 | Type 0 |
| Recognizes aⁿbⁿ | no | yes | yes | yes | yes |
| Recognizes palindromes | no | no | yes | yes | yes |
| Recognizes aⁿbⁿcⁿ | no | no | no | yes | yes |
| Determinism = nondeterminism? | yes | no — strict gap | open | open / yes | |
The PDA occupies exactly one rung of the Chomsky hierarchy: more than a finite automaton, less than a Turing machine. Add a second stack and you leap straight to Turing-complete — two stacks simulate a tape, since you can move the "head" by popping one stack and pushing the other.
What the numbers actually say
- One stack, +1 in the hierarchy. Zero stacks gives regular languages; one stack gives context-free; two stacks gives the full power of a Turing machine. There is no half-step in between.
- NPDA ⊋ DPDA, strictly. The deterministic class is a proper subset. Palindromes (
wwᴿ) sit in the gap — context-free but provably not deterministic context-free. - Parsing cost. A general context-free language can be parsed in O(n³) time (CYK or Earley). But the deterministic subset — exactly what a DPDA recognizes — parses in O(n), which is why real compilers restrict their grammars to LL or LR and run linear-time parsers.
- CFG ↔ PDA conversion. A context-free grammar in Chomsky normal form with g productions converts to an NPDA with a single control state and O(g) transitions — the PDA and the grammar are interconvertible with no blowup in language.
- Membership is decidable, equivalence is not. "Does this PDA accept this string?" is decidable. "Do these two NPDAs accept the same language?" is undecidable. (For DPDAs, equivalence was a famous open problem until Sénizergues proved it decidable in 1997.)
JavaScript implementation
A nondeterministic PDA is most cleanly implemented by exploring all branches with backtracking. Here is an NPDA for balanced parentheses, accepting by empty stack:
// Transition table: state × input × stackTop -> [ [nextState, pushString], ... ]
// pushString replaces the popped top; '' means pop with nothing pushed.
// 'ε' as input means a move that consumes no input symbol.
const Z = '$'; // bottom-of-stack marker
function accepts(transitions, input, start = 'q', startStack = Z) {
const seen = new Set(); // avoid infinite ε-loops
function step(state, pos, stack) {
const key = state + '|' + pos + '|' + stack.join('');
if (seen.has(key)) return false;
seen.add(key);
// Accept by empty stack once all input is consumed.
if (pos === input.length && stack.length === 0) return true;
const top = stack.length ? stack[stack.length - 1] : 'ε';
const sym = pos < input.length ? input[pos] : null;
// Try input-consuming moves, then ε-moves.
for (const [consume, advance] of [[sym, 1], ['ε', 0]]) {
if (consume === null) continue;
const moves = (transitions[state]?.[consume]?.[top]) || [];
for (const [next, push] of moves) {
const ns = stack.slice(0, -1); // pop the top
for (const c of [...push].reverse()) ns.push(c); // push replacement
if (step(next, pos + advance, ns)) return true;
}
}
return false;
}
return step(start, 0, [startStack]);
}
// L = balanced parens over '(' ')'
const paren = {
q: {
'(': { '$': [['q', '($']], '(': [['q', '((']] }, // push ( on top of top
')': { '(': [['q', '']] }, // pop a matching (
'ε': { '$': [['q', '']] }, // empty the bottom marker
},
};
console.log(accepts(paren, '(())')); // true
console.log(accepts(paren, '(()')); // false
console.log(accepts(paren, '())(')); // false
Two details matter. First, pushString replaces the popped top, so pushing '((' grows the stack by one net symbol while pushing '' is a pure pop. Second, the seen set is essential: ε-moves can loop forever, and memoizing the full configuration (state, position, stack) turns the exponential branch search into a terminating one.
Python implementation
The same machine, here recognizing aⁿbⁿ with explicit deterministic transitions — a DPDA, since this language happens to be deterministic context-free:
def dpda_anbn(s: str) -> bool:
"""Deterministic PDA for { a^n b^n | n >= 1 }, accept by final state."""
stack = ['$'] # bottom marker
state = 'PUSH'
for ch in s:
top = stack[-1]
if state == 'PUSH' and ch == 'a':
stack.append('A') # count each a
elif state == 'PUSH' and ch == 'b' and top == 'A':
stack.pop() # first b: start cancelling
state = 'POP'
elif state == 'POP' and ch == 'b' and top == 'A':
stack.pop() # each b cancels one a
else:
return False # no legal move -> reject
# Accept iff we cancelled into the POP state and the stack is back to '$'.
return state == 'POP' and stack == ['$']
for w, want in [('ab', True), ('aabb', True), ('aaabbb', True),
('aab', False), ('abb', False), ('ba', False), ('', False)]:
assert dpda_anbn(w) == want, w
print('all cases pass')
Notice this version never branches — every configuration has at most one move, which is exactly what makes it deterministic and lets it run in a single O(n) left-to-right pass. The nondeterministic JavaScript version above pays for its generality with a backtracking search; the deterministic Python version is the kind of machine a real parser generator emits.
Variants worth knowing
Acceptance by empty stack vs final state. A PDA can accept either by ending in a final state or by emptying its stack. For NPDAs the two modes recognize the identical language class and convert mechanically. For DPDAs they differ — empty-stack acceptance forces the prefix property — so final-state acceptance is the standard DPDA definition.
Two-stack PDA. Add a second stack and the model becomes Turing-complete: the two stacks together simulate a doubly-infinite tape, with the read head sitting between them. This is the cleanest proof that "one more stack" is the entire gap between context-free and computable.
Visibly pushdown automata (VPA). The input alphabet is partitioned so that the symbol itself dictates whether to push, pop, or stay. VPAs are closed under intersection and complement (ordinary PDAs are not) and are the theory behind streaming XML validation.
Counter automata. Restrict the stack alphabet to a single symbol and you get a counter machine — it can only track a number, not a structure. One counter is weaker than a full PDA; two counters, surprisingly, are already Turing-complete (Minsky machines).
Stack automata and pushdown transducers. A stack automaton may read its whole stack non-destructively, climbing the hierarchy further. A pushdown transducer emits output as it runs, which is how syntax-directed translation in a compiler is modeled.
Common bugs and edge cases
- Forgetting the bottom-of-stack marker. Without a sentinel like
Z₀/$, "is the stack empty?" and "did I pop too many?" become indistinguishable, and the machine accepts strings such as()). - Infinite ε-loops. A cycle of ε-moves that grows the stack never terminates. Any NPDA simulator must memoize the full configuration
(state, position, stack)or impose a stack-height bound. - Assuming determinization works. The reflex carried over from finite automata — "just build the deterministic equivalent" — is false for PDAs. Palindromes have no DPDA at all; the conversion simply does not exist.
- Reaching for a regex on nested input. Validating balanced brackets or nested tags with a classic regular expression is mathematically impossible. The task is context-free, so it needs a stack — use a parser, not a pattern.
- Trying to match three counts. One stack handles two correlated counts.
aⁿbⁿcⁿand "equal numbers ofa,b,c" are not context-free; the pumping lemma for CFLs rules them out, and you must move up to a context-sensitive model. - Confusing the two acceptance modes for a DPDA. Code that empties the stack to accept will silently reject any valid input whose accepting prefix is itself accepting. Use final-state acceptance for deterministic machines.
Frequently asked questions
Why can't a finite automaton recognize balanced parentheses?
A finite automaton has only a fixed number of states, so it can count opening brackets only up to that bound. The language of balanced parentheses requires matching an unbounded count of '(' against ')'. The pumping lemma for regular languages proves no finite-state machine can do this; you need unbounded memory, which the pushdown automaton supplies as a stack.
What is the difference between a deterministic and a nondeterministic PDA?
A nondeterministic PDA (NPDA) may have several legal moves on the same state, input symbol, and stack top, and it accepts if any branch leads to acceptance. A deterministic PDA (DPDA) allows at most one move per configuration. Crucially they are not equivalent: NPDAs recognize all context-free languages, but DPDAs recognize only the strictly smaller class of deterministic context-free languages.
Are pushdown automata as powerful as Turing machines?
No. A PDA can only read and write the top of its stack — last in, first out — so it cannot recognize languages that need to match three counts at once, like a^n b^n c^n. A Turing machine has a tape with random access in both directions, which is strictly more powerful. Two stacks, however, are enough to simulate a Turing tape.
Does acceptance by empty stack differ from acceptance by final state?
For nondeterministic PDAs the two acceptance modes recognize exactly the same class of languages — the context-free languages — and you can mechanically convert between them. For deterministic PDAs they differ: empty-stack acceptance forces the prefix property, so final-state DPDAs are the standard and strictly more expressive definition.
How does a PDA relate to a context-free grammar?
They are two faces of the same class. Every context-free grammar can be converted to an equivalent nondeterministic PDA (the standard construction puts the grammar's productions on the stack and simulates a leftmost derivation), and every PDA can be converted back to a grammar. So 'context-free language' can be defined either as 'generated by a CFG' or 'accepted by a PDA'.
Where do pushdown automata show up in real software?
Every parser is a PDA in disguise. LL and LR parsers maintain an explicit stack of grammar symbols or states; recursive-descent parsers use the call stack as the PDA stack. Regular-expression engines stay finite-state because regexes are regular, which is exactly why classic regex cannot match nested brackets without recursion extensions.