Theory

Context-Free Grammars

The recursion rules that turn flat text into nested trees

A context-free grammar is a set of production rules that rewrite a single nonterminal into strings of symbols, generating nested, recursive structure — the formalism behind every programming-language parser, parsed in O(n³) worst case.

Chomsky typeType-2
Recognizing machinePushdown automaton
General parse timeO(n³)
LL(1) / LR(1) parseO(n)
Ambiguity testUndecidable

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

How a context-free grammar generates structure

Start with one symbol and a box of rewrite rules. Each rule says "this nonterminal can be replaced by this sequence of symbols." Keep applying rules until nothing is left but terminals — the literal tokens of your language — and you've derived a string. Run the process in reverse, recovering the rules that built a given string, and you've parsed it. That round trip is the whole job of a context-free grammar.

Formally a CFG is a 4-tuple G = (V, Σ, R, S): a set of nonterminals V (the named structures, like Expr or Stmt), a set of terminals Σ (the actual tokens, disjoint from V), a set of production rules R, and a designated start symbol S ∈ V. A rule has the shape A → α, where A is a single nonterminal and α is any (possibly empty) string of terminals and nonterminals.

The word "context-free" is the load-bearing part. The left side of every rule is exactly one nonterminal with nothing around it. You may always rewrite A as α regardless of what symbols sit to its left or right — its context doesn't matter. Contrast a context-sensitive rule like a A b → a c b, which only fires when A is flanked by a and b. Removing context is what lets the same nonterminal nest inside itself arbitrarily deep, and it's what keeps parsing tractable.

Here is the canonical example — balanced expressions with addition, multiplication, and parentheses:

Expr   → Expr "+" Term  |  Term
Term   → Term "*" Factor |  Factor
Factor → "(" Expr ")"    |  number

The pipe | is shorthand for "or another rule for the same nonterminal." The self-reference (Expr appears inside its own rule) is the recursion that gives you unbounded nesting: 1 + 2 * (3 + 4) derives a tree where Expr contains a Term that contains a Factor that contains another whole Expr.

Derivations, parse trees, and the Chomsky hierarchy

A derivation is a sequence of rewrites from S to a terminal string. A leftmost derivation always rewrites the leftmost nonterminal first; a rightmost one does the opposite. Either way, the rules you applied form a parse tree (concrete syntax tree): the root is S, each internal node is a nonterminal, each rule application is a node with its right-hand side as children, and the leaves read left-to-right spell out the input. The parse tree, not the linear derivation, is what a compiler actually consumes — it's the nested structure the grammar promised.

CFGs sit at level 2 of the Chomsky hierarchy, the 1956 classification of grammars by the restrictions on their rules:

Type 3 — Regular. Rules limited to A → aB or A → a. Recognized by a finite-state machine. No counting, no nesting.
Type 2 — Context-free. Rules A → α. Recognized by a pushdown automaton — a finite machine plus a stack. The stack is what lets it count nesting depth.
Type 1 — Context-sensitive. Rules αAβ → αγβ with γ non-empty. Recognized by a linear-bounded automaton.
Type 0 — Recursively enumerable. Any rule. Recognized by a full Turing machine.

Each level strictly contains the one below it. The jump from regular to context-free is exactly the jump from "no memory of how deep you are" to "a stack that remembers." That single stack is why aⁿbⁿ (the same number of a's then b's) and balanced brackets are context-free but not regular.

When to reach for a CFG

Anything with nesting. Arithmetic with parentheses, nested blocks, JSON, XML, S-expressions — if structures can contain structures of the same kind, you need at least a CFG.
Programming-language syntax. Every language's reference manual contains its grammar in BNF or EBNF. Parser generators (yacc/bison, ANTLR, tree-sitter) take a CFG and emit a parser.
Configuration and data formats. Even "simple" formats like TOML or a query DSL outgrow regexes the moment they allow nesting or quoting.
Natural-language and protocol parsing where a phrase structure recurses.

Don't reach for a CFG when a regular expression suffices — matching a date, an email, a fixed token. Regexes are smaller, faster, and need no stack. And don't expect a CFG to enforce semantics: it can say x = y + 1 is well-formed, but not that y was declared. That's a later, context-sensitive pass.

CFG vs other formal language classes

	Regular (regex)	Context-free	Context-sensitive	Deterministic CF (LR)
Chomsky type	Type 3	Type 2	Type 1	Type 2 (subclass)
Recognizing machine	Finite-state automaton	Pushdown automaton	Linear-bounded automaton	Deterministic PDA
Memory model	Fixed states only	One stack	Tape ≤ input length	One stack, no backtrack
Matches balanced ( )	No	Yes	Yes	Yes
Matches aⁿbⁿcⁿ	No	No	Yes	No
Membership / parse time	O(n)	O(n³) general	PSPACE-complete	O(n)
Closed under intersection	Yes	No	Yes	No
Emptiness decidable	Yes	Yes	No	Yes
Equivalence decidable	Yes	No (undecidable)	No	Yes (Sénizergues)

Two rows surprise people. CFGs are not closed under intersection — intersect aⁿbⁿcᵐ with aᵐbⁿcⁿ and you get aⁿbⁿcⁿ, which is context-sensitive, not context-free. And asking whether two CFGs generate the same language is flatly undecidable, which is why you can't have a compiler that proves your hand-written grammar matches the language spec.

What the numbers actually say

General parsing is O(n³). The CYK and Earley algorithms recognize any CFG in cubic time. For a 10,000-token file that's 10¹² operations — about 17 minutes at 10⁹ ops/sec. That's why no production compiler uses general CFG parsing on whole files.
Engineered grammars parse in O(n). Restrict to LL(1) or LR(1) and parsing is a single linear sweep with a stack. The same 10,000 tokens take ~10⁴ operations — under a microsecond of asymptotic work. Real C/Java/Python parsers run at hundreds of MB/sec.
Earley interpolates. O(n³) for arbitrarily ambiguous grammars, O(n²) for unambiguous ones, O(n) for "LR-like" ones — so you pay only for the ambiguity you actually use.
CYK needs Chomsky Normal Form. Converting a grammar to CNF (every rule A → BC or A → a) can square the number of rules in the worst case, a one-time preprocessing cost paid before the O(n³) parse.

JavaScript: parsing a CFG by recursive descent

The most direct way to implement an LL grammar is one function per nonterminal — the structure of the code mirrors the structure of the rules. Here's a parser for the expression grammar above that returns a parse tree. Note we rewrote left-recursive Expr → Expr "+" Term into a loop, because naive recursive descent loops forever on left recursion (see the bugs section).

// Tokenize: numbers, +, *, ( , )
function tokenize(src) {
  return src.match(/\d+|[+*()]/g) || [];
}

function parse(src) {
  const toks = tokenize(src);
  let i = 0;
  const peek = () => toks[i];
  const eat  = (t) => {
    if (toks[i] !== t) throw new Error(`expected ${t}, got ${toks[i] ?? 'EOF'}`);
    return toks[i++];
  };

  // Expr → Term ("+" Term)*
  function Expr() {
    let node = Term();
    while (peek() === '+') { eat('+'); node = { op: '+', left: node, right: Term() }; }
    return node;
  }
  // Term → Factor ("*" Factor)*
  function Term() {
    let node = Factor();
    while (peek() === '*') { eat('*'); node = { op: '*', left: node, right: Factor() }; }
    return node;
  }
  // Factor → "(" Expr ")" | number
  function Factor() {
    if (peek() === '(') { eat('('); const e = Expr(); eat(')'); return e; }
    const t = peek();
    if (!/^\d+$/.test(t ?? '')) throw new Error(`expected number, got ${t ?? 'EOF'}`);
    eat(t);
    return { num: Number(t) };
  }

  const tree = Expr();
  if (i !== toks.length) throw new Error(`trailing input at ${toks[i]}`);
  return tree;
}

// parse("1 + 2 * (3 + 4)") →
//   { op:'+', left:{num:1}, right:{ op:'*', left:{num:2}, right:{op:'+',...} } }

Because Term binds tighter than Expr in the call hierarchy, multiplication automatically grabs its operands before addition — the grammar's stratification is the precedence. No precedence table needed.

Python: the CYK membership test for any CFG

Recursive descent only handles nicely-behaved grammars. For an arbitrary CFG, the CYK algorithm (Cocke–Younger–Kasami, 1965–67) answers "is this string in the language?" in O(n³·|G|) time using dynamic programming, provided the grammar is in Chomsky Normal Form. It fills a triangular table where cell (i, length) holds every nonterminal that can derive the substring of that length starting at i.

def cyk(grammar, start, word):
    """grammar: dict nonterminal -> list of productions.
       Each production is ('a',) terminal  or  ('B', 'C') two nonterminals (CNF).
       Returns True if `word` (a list of terminals) is in the language."""
    n = len(word)
    if n == 0:
        return any(p == () for p in grammar.get(start, []))

    # table[length][i] = set of nonterminals deriving word[i : i+length]
    table = [[set() for _ in range(n)] for _ in range(n + 1)]

    # Length-1 spans: which nonterminals derive a single terminal?
    for i, sym in enumerate(word):
        for A, prods in grammar.items():
            if (sym,) in prods:
                table[1][i].add(A)

    # Longer spans: split into two parts and look for a rule A -> B C
    for length in range(2, n + 1):
        for i in range(n - length + 1):
            for split in range(1, length):
                left  = table[split][i]
                right = table[length - split][i + split]
                for A, prods in grammar.items():
                    for prod in prods:
                        if len(prod) == 2 and prod[0] in left and prod[1] in right:
                            table[length][i].add(A)

    return start in table[n][0]


# S -> AB | BC ; A -> BA | a ; B -> CC | b ; C -> AB | a  (a textbook CNF grammar)
G = {
    'S': [('A', 'B'), ('B', 'C')],
    'A': [('B', 'A'), ('a',)],
    'B': [('C', 'C'), ('b',)],
    'C': [('A', 'B'), ('a',)],
}
print(cyk(G, 'S', list('baaba')))   # True

The three nested loops over length, i, and split are exactly the O(n³) cost. Each cell is computed once and reused, which is what dynamic programming buys you over the exponential blowup of trying every derivation.

Notations and variants worth knowing

BNF and EBNF. Backus–Naur Form is the standard notation for writing CFG rules, introduced by John Backus for ALGOL 60 in 1959 and refined by Peter Naur. Extended BNF adds * (zero-or-more), + (one-or-more), ? (optional), and grouping, but every EBNF construct desugars back into plain context-free productions — it's sugar, not extra power.

Chomsky Normal Form (CNF). Every rule is A → BC or A → a. Required by CYK. Any CFG can be mechanically converted, removing ε-rules, unit rules, and long right-hand sides.

Greibach Normal Form (GNF). Every rule begins with a terminal: A → aα where α is nonterminals only. Useful in proofs and for top-down parsers because it guarantees progress on every step.

PEGs (Parsing Expression Grammars). Look like CFGs but the | is ordered choice — the first alternative that matches wins, so PEGs are never ambiguous. They power packrat parsers (linear time with memoization) and many modern parser libraries, but they describe a different, incomparable class of languages.

Attribute and probabilistic grammars. Attribute grammars hang computed values on parse-tree nodes for semantic checks; probabilistic CFGs (PCFGs) attach a probability to each rule and underlie classical statistical NLP parsing.

Common bugs and edge cases

Left recursion in recursive descent. A rule like Expr → Expr "+" Term makes a top-down parser call Expr() as its first act forever — a stack overflow. Eliminate it by rewriting to a loop (as above) or to right recursion.
Unintended ambiguity. E → E + E | E * E | id gives 1 + 2 * 3 two trees. Stratify into precedence levels (Expr/Term/Factor) so each operator lives at exactly one tier.
The dangling-else. if a then if b then x else y — does the else bind to the inner or outer if? Classic ambiguity; resolved by a grammar that forbids an unmatched if before an else, or by a "match nearest" parser rule.
Forgetting the ε (empty) production. Optional and list constructs need an explicit empty alternative — e.g. Args → Expr ("," Expr)* | ε — or zero-element cases silently fail to parse.
Assuming a CFG can count three things. aⁿbⁿcⁿ is the textbook proof (via the pumping lemma for context-free languages) that one stack isn't enough. If your "language" needs to balance three quantities, it isn't context-free.
Mistaking syntactic validity for semantic validity. The grammar accepts undeclared + 1 happily; "undeclared is not in scope" is a job for a later pass over the tree, not the CFG.

Frequently asked questions

What makes a grammar context-free?

The left-hand side of every production is a single nonterminal, with no surrounding context. A → α is allowed; xAy → xβy is not. That restriction is exactly what lets a nonterminal be expanded the same way no matter where it appears, which is what makes recursive nesting — and efficient parsing — possible.

Why can't a regular expression match balanced parentheses?

Regular expressions describe regular languages, which a finite-state machine recognizes with a fixed amount of memory. Balanced parentheses require counting arbitrarily deep nesting, which needs unbounded memory — a stack. Context-free grammars get that stack (via a pushdown automaton), so S → ( S ) S | ε matches any depth of nesting that no regex can.

How fast can you parse a context-free grammar?

The general worst case is O(n³) for the CYK and Earley algorithms (Earley drops to O(n²) for unambiguous grammars and O(n) for LR-style ones). Most real languages are engineered into the LL(1) or LR(1) subclasses, which parse deterministically in O(n) — linear in the length of the input.

What is an ambiguous grammar?

A grammar is ambiguous if some string has two or more distinct parse trees. The classic case is E → E + E | E * E | id, where 1 + 2 * 3 can parse as (1+2)*3 or 1+(2*3). You fix it by stratifying the grammar into precedence levels, or by adding disambiguation rules to the parser. Determining whether an arbitrary CFG is ambiguous is undecidable.

What's the difference between BNF and a context-free grammar?

They're the same thing in two notations. Backus–Naur Form (BNF) is just a syntax for writing context-free production rules — Backus introduced it for ALGOL 60 in 1959. Extended BNF (EBNF) adds shorthand for repetition (*), option (?), and grouping, but every EBNF rule desugars back into plain context-free productions.

Are programming languages actually context-free?

Their syntax is context-free; their semantics are not. A CFG describes the shape of valid expressions and statements, but rules like "a variable must be declared before use" or "argument count must match the function signature" are context-sensitive. Real compilers parse with a CFG, then enforce those rules in a separate semantic-analysis pass over the resulting tree.