Combinatorics
Catalan Numbers
C_n = (2n choose n)/(n+1) — the count that appears in 60+ disguises, from balanced parentheses to triangulated polygons
The Catalan numbers 1, 1, 2, 5, 14, 42, 132, 429 form one of the most ubiquitous sequences in combinatorics. The same formula counts balanced parenthesisations, full binary trees, lattice paths that never cross a diagonal, and triangulations of convex polygons — Richard Stanley's monograph lists 214 of these interpretations.
- FormulaC_n = C(2n,n)/(n+1)
- First terms1, 1, 2, 5, 14, 42
- Asymptotic~ 4ⁿ / (n^(3/2)√π)
- Named afterEugène Catalan, 1838
- Stanley count214 interpretations
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
The formula and the first values
The nth Catalan number is
C_n = (1/(n+1)) × C(2n, n) = (2n)! / ((n+1)! × n!)
Plugging in small n produces the sequence that anyone who has worked through Project Euler or LeetCode has seen many times:
n | 0 1 2 3 4 5 6 7 8 9
C_n | 1 1 2 5 14 42 132 429 1430 4862
The sequence grows fast — C₂₀ already exceeds 6.5 billion, and C₃₀ is over 3.8 quadrillion. Asymptotically C_n ~ 4ⁿ / (n^(3/2) √π), so the ratio C_(n+1)/C_n approaches 4 as n grows. That's the same growth rate as the central binomial coefficient C(2n, n), which the Catalan formula divides by (n+1).
An equivalent closed form drops the division entirely:
C_n = C(2n, n) − C(2n, n+1)
This identity falls out of the reflection principle: the bad lattice paths (those that cross a forbidden diagonal) are in bijection with all paths ending at a reflected endpoint, so subtracting them from the total gives the good ones.
The recurrence and the convolution
Catalan numbers satisfy two famous recurrences. The first is the convolution recurrence:
C_(n+1) = Σ_{i=0..n} C_i × C_(n-i)
= C_0·C_n + C_1·C_(n-1) + ... + C_n·C_0
This is what you get when you split a Catalan structure at a canonical position. For balanced parentheses, the first '(' must be matched by some ')' at position 2k+1; the inside is a Catalan structure of size k and the rest is a Catalan structure of size n − k. For binary trees, choose the size of the left subtree. For polygon triangulations, choose which triangle contains a fixed edge. Every Catalan interpretation has its own version of the same split.
The second recurrence is linear:
C_(n+1) = ((4n + 2) / (n + 2)) × C_n
This is the form most efficient for computation — it gives C_n in O(n) arithmetic operations using only one multiplication and one division per step, with the intermediate values staying as exact integers if you compute carefully.
The generating function
The ordinary generating function
c(x) = Σ_{n≥0} C_n × xⁿ = 1 + x + 2x² + 5x³ + 14x⁴ + ...
satisfies a quadratic functional equation that mirrors the convolution recurrence:
c(x) = 1 + x · c(x)²
Solving this quadratic for c(x) and selecting the branch that's analytic at x = 0 gives
c(x) = (1 − √(1 − 4x)) / (2x)
This closed form makes asymptotic analysis trivial: the singularity at x = 1/4 is the source of the 4ⁿ growth, and the square-root branch point gives the n^(−3/2) correction by standard singularity analysis.
Worked example: triangulating a hexagon
How many ways can you cut a convex hexagon into triangles using only non-crossing diagonals? The Catalan-number prediction is C_4 = 14. Let's verify by enumerating, using vertex labels 0, 1, 2, 3, 4, 5.
Pick the edge (0, 5) — every triangulation of a hexagon contains a unique triangle that has this edge. The third vertex of that triangle can be any of {1, 2, 3, 4}. Splitting on this choice partitions the hexagon into a sub-polygon on the left and a sub-polygon on the right, and Catalan's convolution structure becomes visible:
Triangle (0, 5, 1): left = nothing, right = pentagon (1,2,3,4,5)
triangulations = C_0 × C_3 = 1 × 5 = 5
Triangle (0, 5, 2): left = triangle (0,1,2), right = quadrilateral (2,3,4,5)
triangulations = C_1 × C_2 = 1 × 2 = 2
Triangle (0, 5, 3): left = quadrilateral (0,1,2,3), right = triangle (3,4,5)
triangulations = C_2 × C_1 = 2 × 1 = 2
Triangle (0, 5, 4): left = pentagon (0,1,2,3,4), right = nothing
triangulations = C_3 × C_0 = 5 × 1 = 5
─────────────────────────────────────────────────────────
Total: 5 + 2 + 2 + 5 = 14 ✓
This is the convolution recurrence in action: C_4 = C_0·C_3 + C_1·C_2 + C_2·C_1 + C_3·C_0 = 5 + 2 + 2 + 5 = 14. The same number 14 counts the binary trees with 4 internal nodes, the balanced 8-character parenthesisations, and the Dyck paths from (0,0) to (8,0) — all of which are bijective images of the hexagon triangulations.
Bijections — why the same number appears everywhere
Every Catalan-counted family is bijective with every other. The cleanest bridge is between balanced parentheses and Dyck paths: read '(' as a step (1, 1) and ')' as a step (1, −1). A length-2n parenthesisation is balanced iff every prefix has at least as many '(' as ')', which is exactly the condition that the corresponding lattice path stays at or above the x-axis. So balanced strings of length 2n ↔ Dyck paths of length 2n.
From Dyck paths to binary trees: each peak (an up-step immediately followed by a down-step) corresponds to a leaf; collapse peaks to merge their parents into the next-level structure. Recursively this builds a full binary tree with n internal nodes from a Dyck path of length 2n.
From binary trees to polygon triangulations: label the n+2 vertices of the polygon 0, 1, ..., n+1 and pick edge (0, n+1). The unique triangle containing that edge has a third vertex k, and (1, ..., k) form the left sub-polygon while (k, ..., n) form the right — the same recursive structure as a binary tree's left and right children.
From triangulations back to balanced parentheses: each diagonal contributes a matched pair. The depth-first traversal order of the resulting binary tree produces the parenthesisation. The cycle is closed; all the families are the same combinatorial object.
Catalan numbers vs related sequences
| Sequence | Formula | n=4 value | Counts | Growth |
|---|---|---|---|---|
| Factorial n! | n × (n−1) × ... × 1 | 24 | Permutations of n items | Super-exponential |
| Central binomial C(2n,n) | (2n)! / (n!)² | 70 | All lattice paths to (n,n) | ~ 4ⁿ / √(πn) |
| Catalan C_n | C(2n,n) / (n+1) | 14 | Non-crossing structures | ~ 4ⁿ / (n^(3/2)√π) |
| Motzkin M_n | Σ C(n,2k)·C_k | 9 | Non-crossing chord diagrams with isolated points | ~ 3ⁿ / n^(3/2) |
| Schröder S_n | Hypergeometric | 45 | Lattice paths with diagonal steps | ~ (3+2√2)ⁿ / n^(3/2) |
| Bell B_n | Stirling sum | 15 | Partitions of an n-set | Super-exponential |
| Narayana N(n,k) | (1/n) C(n,k) C(n,k−1) | 14 (sum over k) | Refines Catalan by peak count | Sums to C_n |
Catalan numbers sit between the unrestricted central binomials and the more constrained Motzkin numbers. The ratio C_n / C(2n, n) = 1/(n+1) is the fraction of lattice paths that stay non-negative — Bertrand's ballot ratio.
A sampling of Catalan interpretations
Stanley's 214 catalogue is famous, but a baker's-dozen subset gives the flavour. Each of the following is C_n:
- Balanced parenthesisations of n opening and n closing brackets — the canonical example. C_3 = 5: ((())), (()()), (())(), ()(()), ()()().
- Dyck paths from (0, 0) to (2n, 0) using up-steps (1, 1) and down-steps (1, −1) and never going below the x-axis.
- Full binary trees with n internal (non-leaf) nodes — equivalently with n+1 leaves.
- Triangulations of a convex polygon with n+2 vertices using non-crossing diagonals.
- Mountain ranges — sequences of n upstrokes and n downstrokes that stay at or above sea level.
- Ballot sequences in which candidate A is never behind candidate B during a count of n votes each.
- Stack-sortable permutations of {1, ..., n} — those avoiding the pattern 231.
- Non-crossing partitions of {1, ..., n} (when blocks are drawn as chords on a circle).
- Standard Young tableaux of shape (n, n) — fillings of a 2×n grid by 1..2n that increase along rows and columns.
- Plane (ordered) rooted trees with n+1 nodes.
- Ways to multiply n+1 matrices via the convolution recurrence — though only the count of parenthesisations, not the optimal one (which depends on dimensions).
- Non-crossing handshakes at a round table of 2n people, where everyone shakes exactly one hand and no two arms cross.
- Subdiagonal lattice paths from (0, 0) to (n, n) using right and up steps that never go above the diagonal.
Each pair of these has an explicit bijection. Discovering a new bijection between two Catalan families is still a publishable result — the field of "Catalan combinatorics" is alive and growing.
Where Catalan numbers show up in practice
- Compiler design — abstract syntax trees. The number of distinct ways to parse a binary expression with n operators is C_n. Compilers don't enumerate them — they pick one — but the count tells you that the search space for matrix-chain optimisation grows as 4ⁿ before pruning.
- Computational biology — secondary RNA structures. An RNA hairpin is a sequence of bases that pairs onto itself; valid pairings are non-crossing. The number of secondary structures on a length-n RNA without isolated bases is bounded above by C_n, and dynamic-programming algorithms like Nussinov's run in O(n³) by exploiting the convolution structure.
- Stock-market technical analysis — Dyck-path patterns. "Bracket-balanced" walks of returns over n intervals, where the cumulative return never drops below the starting value, are Dyck-path-distributed. Catalan asymptotics give the probability that a random walk of length 2n stays non-negative as ~ 1/√(πn).
- Knot theory — meanders and chord diagrams. The number of non-isotopic n-strand simple meanders is bounded by C_n. Catalan numbers also count non-crossing chord diagrams, which classify cabled knots.
- Tournament scheduling — single-elimination brackets. The number of distinct tournament shapes (full binary trees) with n+1 teams is C_n. ESPN's 64-team NCAA bracket has C_63 ≈ 9.6 × 10^36 distinct seeded structures before tiebreakers and re-seeding rules are applied.
Variants and extensions
- Super-Catalan numbers S(m, n) = (2m)!(2n)! / (m!n!(m+n)!). A two-parameter family that reduces to Catalan when m = n − 1. The fact that S(m, n) is always an integer was conjectured by Catalan and proven by combinatorial bijection in 1992.
- Fuss–Catalan numbers C_n^(k) = (1/(kn+1)) × C(kn+1, n). Generalise to k-ary trees. C_n^(2) is the standard Catalan; C_n^(3) counts ternary trees with n internal nodes.
- q-Catalan numbers. A polynomial generalisation that specialises to C_n at q = 1. They count Catalan structures weighted by a statistic (major index, number of inversions). Carlitz–Riordan q-Catalans are central in symmetric-function theory.
- Narayana numbers N(n, k) = (1/n) × C(n, k) × C(n, k−1). Refine C_n by classifying Dyck paths according to number of peaks. They sum to C_n: Σ_k N(n, k) = C_n.
- Catalan triangle (ballot numbers). A triangular array whose row sums are C_n and whose entries B(n, k) = ((n−k+1)/(n+1)) × C(n+k, k) count lattice paths from (0,0) to (n, k) staying weakly above the diagonal.
Common pitfalls
- Off-by-one indexing. Some sources index from C_1 = 1 rather than C_0 = 1, shifting all the formulas. The OEIS canonical entry is A000108 starting at C_0; double-check before plugging into a recurrence.
- Confusing C_n with C(n, k). The notation collision is brutal — C_n is the Catalan number with one subscript; C(n, k) or C(n choose k) is the binomial. Some authors write Cat(n) to disambiguate.
- Computing C_n via factorials. Direct factorial computation overflows fast — 30! already needs more than 64 bits. Use the linear recurrence
C_(n+1) = ((4n + 2) / (n + 2)) × C_nwith arbitrary-precision integers. - Forgetting the (n+1) divisor. The central binomial C(2n, n) is bigger than C_n by a factor of (n + 1). Confusing the two gives counts that are off by a polynomial in n — the asymptotic constant changes from 1/√(πn) to 1/(n^(3/2)√π).
- Assuming all "non-crossing" structures are Catalan. Non-crossing partitions and non-crossing handshakes are Catalan, but non-crossing tree-like structures with extra constraints (e.g. leaf-labelled phylogenetic trees) often satisfy different recurrences. Always check the splitting structure, not just the geometry.
Frequently asked questions
Why do so many different objects count to the same Catalan number?
Each interpretation has a bijection — a one-to-one correspondence — with one of the others. A balanced parenthesisation of length 2n maps to a Dyck path by reading '(' as up-step and ')' as down-step; a Dyck path maps to a triangulated polygon by treating each up–down pair as an ear cut; the polygon triangulations map to full binary trees by labelling internal triangles. Once you have one bijection, all the structures become the same object dressed differently.
What is the closed-form formula for C_n?
C_n = (1/(n+1)) × C(2n, n) = (2n)! / ((n+1)! × n!). Equivalently C_n = C(2n, n) − C(2n, n+1) — the difference of two adjacent central binomial coefficients. Both forms produce 1, 1, 2, 5, 14, 42, 132, 429, 1430, 4862 for n = 0..9.
How does the recursion C_n = Σ C_i × C_(n-1-i) work?
Pick any of the C_n objects and split it at a canonical position — the matching closer of the first opener for parentheses, the root of a binary tree, the diagonal containing vertex 0 for triangulations. The left and right halves are independent smaller Catalan structures of total size n − 1. Summing over all positions of the split gives the convolution. The recurrence C_(n+1) = Σ_{i=0..n} C_i × C_(n-i) starts from C_0 = 1 and reproduces the entire sequence.
Where does the (n+1) divisor come from?
The cycle lemma: in any sequence of n+1 ups and n downs, exactly one cyclic rotation is a Dyck path. So good sequences are 1/(2n+1) of all C(2n+1, n) rotations of length 2n+1, which simplifies to C_n = (1/(n+1)) × C(2n, n). Equivalently, the reflection principle shows that the bad lattice paths are in bijection with all paths to a reflected endpoint, so good paths = C(2n, n) − C(2n, n+1).
How fast do Catalan numbers grow?
Asymptotically C_n ~ 4^n / (n^(3/2) × √π) — exponential growth with base 4 and a polynomial damping factor of n^(3/2). C_30 already exceeds 3.8 × 10^15, and C_100 is roughly 8.96 × 10^57. The exponential factor makes naive enumeration of all binary trees on 30 internal nodes already infeasible.
Who is the sequence named after?
Belgian mathematician Eugène Catalan, who studied them around 1838 in connection with parenthesisations of products. Mongolian mathematician Mingantu had derived equivalent recurrences in the 1730s, and Euler counted polygon triangulations in 1751, but the modern name and the unified treatment trace to Catalan's work. Richard Stanley's 2015 monograph 'Catalan Numbers' lists 214 distinct combinatorial interpretations.