Combinatorics
Inclusion-Exclusion Principle
Add the parts, subtract the overlaps, add back the triple overlaps — counting unions of sets without double-counting
When you count a union of overlapping sets by adding their individual sizes, the overlaps get counted twice. The inclusion-exclusion principle corrects this by alternately subtracting pairwise overlaps, adding triple overlaps, and so on. The same machinery produces derangement counts, Euler's totient, surjection counts and probability identities.
- Two-set form|A∪B| = |A|+|B|−|A∩B|
- Sign of k-fold intersection(−1)^(k+1)
- Number of terms2ⁿ − 1
- Derangement limitD_n / n! → 1/e
- AliasesSieve, Möbius inversion
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
The formula
For finite sets A₁, A₂, ..., A_n, the inclusion-exclusion formula gives the size of the union:
|A₁ ∪ A₂ ∪ ... ∪ A_n| = Σ_{S ⊆ {1..n}, S ≠ ∅} (−1)^(|S|+1) × |∩_{i ∈ S} A_i|
Spelled out, the small cases look like this:
n = 2: |A ∪ B| = |A| + |B| − |A ∩ B|
n = 3: |A ∪ B ∪ C| = |A| + |B| + |C|
− |A∩B| − |A∩C| − |B∩C|
+ |A∩B∩C|
n = 4: |A ∪ B ∪ C ∪ D| = (singletons)
− (six pairwise intersections)
+ (four triple intersections)
− (one quadruple intersection)
The total number of terms is 2ⁿ − 1, one for each non-empty subset of indices. The sign alternates by size: a singleton intersection enters with +, a pairwise with −, a triple with +, and so on.
Why the alternating sum is right
Pick any element x in the union. Count how often x contributes to the right-hand side. Suppose x lies in exactly k of the n sets. Then it shows up in C(k, 1) singletons (added with sign +), C(k, 2) pairs (subtracted), C(k, 3) triples (added), and so on. Its total contribution is
Σ_{j=1..k} (−1)^(j+1) × C(k, j)
= − Σ_{j=1..k} (−1)^j × C(k, j)
= − [(1 − 1)^k − C(k, 0)]
= − [0 − 1]
= 1
Every element is counted exactly once. The binomial theorem applied to (1 − 1)^k = 0 is doing all the work. The proof generalises immediately to weighted sums and probabilities, since the algebraic identity has nothing to do with whether the "size" function is cardinality or measure.
Worked example: the derangement D₅ = 44
A derangement of {1, 2, ..., n} is a permutation that fixes no element — no person ends up holding their own coat. Let's count derangements of 5 items using inclusion-exclusion.
Let A_i = {permutations that fix position i}. We want the number of permutations not in any A_i, which is
D_5 = 5! − |A_1 ∪ A_2 ∪ A_3 ∪ A_4 ∪ A_5|
Each k-fold intersection |A_{i₁} ∩ ... ∩ A_{i_k}| has (5 − k)! elements: the k positions are pinned, and the remaining 5 − k positions can permute freely. There are C(5, k) such intersections. So
|A_1 ∪ ... ∪ A_5| = Σ_{k=1..5} (−1)^(k+1) × C(5, k) × (5−k)!
= +C(5,1)×4! − C(5,2)×3! + C(5,3)×2! − C(5,4)×1! + C(5,5)×0!
= +5×24 − 10×6 + 10×2 − 5×1 + 1×1
= +120 − 60 + 20 − 5 + 1
= 76
D_5 = 120 − 76 = 44
The general formula simplifies to
D_n = n! × Σ_{k=0..n} (−1)^k / k!
= n! × (1 − 1/1! + 1/2! − 1/3! + ... ± 1/n!)
≈ n! / e
For n = 5: D_5 = 120 × (1 − 1 + 1/2 − 1/6 + 1/24 − 1/120) = 120 × 11/30 = 44. The ratio D_n / n! converges to 1/e ≈ 0.3679 very fast — already at n = 5 it equals 44/120 ≈ 0.3667.
Surjections and Stirling numbers
How many onto functions are there from an n-element set to a k-element set? Equivalently, how many ways are there to distribute n distinguishable balls into k distinguishable boxes with no box empty? Inclusion-exclusion handles this immediately.
Let A_i = {functions whose image misses element i}. We want all functions minus those that miss at least one element:
Surj(n, k) = k^n − |A_1 ∪ ... ∪ A_k|
= Σ_{j=0..k} (−1)^j × C(k, j) × (k−j)^n
Pulling out the factorial gives the connection to Stirling numbers of the second kind:
Surj(n, k) = k! × S(n, k)
S(n, k) = (1/k!) × Σ_{j=0..k} (−1)^j × C(k, j) × (k − j)^n
S(n, k) counts partitions of an n-element set into k non-empty blocks. The closed form falls out of inclusion-exclusion in two lines, which is one of the prettiest applications of the principle.
Euler's totient — sieving away multiples
Euler's totient function φ(n) counts integers in [1, n] coprime to n. Use the multiples of distinct prime divisors as the bad sets. If n = p₁^a₁ × p₂^a₂ × ... × p_k^a_k, then
φ(n) = n − Σ n/p_i + Σ n/(p_i p_j) − ... = n × ∏ (1 − 1/p_i)
Worked instance for n = 30 = 2 × 3 × 5:
φ(30) = 30 − (30/2 + 30/3 + 30/5) + (30/6 + 30/10 + 30/15) − 30/30
= 30 − (15 + 10 + 6) + (5 + 3 + 2) − 1
= 30 − 31 + 10 − 1
= 8
Verify: integers coprime to 30 in [1, 30] are 1, 7, 11, 13, 17, 19, 23, 29 — exactly 8. ✓
The product form φ(n) = n × ∏ (1 − 1/p) is just the inclusion-exclusion sum factored back into a product, which is possible because the prime sets are nicely independent.
Inclusion-exclusion vs alternative counting strategies
| Method | Best for | Operations | Failure mode | Examples |
|---|---|---|---|---|
| Bijection | When two structures match exactly | One-to-one mapping | Hard to find the bijection | Catalan structures |
| Direct enumeration | Small cases | List all elements | Exponential blow-up | n ≤ 8 typically |
| Recurrence | Sequences with self-similar structure | O(n) or O(n²) | Need to discover recurrence | Fibonacci, Catalan |
| Inclusion-exclusion | Complement of forbidden conditions | 2ⁿ subsets | Exponential in number of conditions | Derangements, totient |
| Generating function | Closed-form coefficients | Algebraic manipulation | May not have closed form | Partitions, compositions |
| Möbius inversion | Sums over divisor / poset structure | O(d(n)) per inversion | Need lattice structure | Number-theoretic identities |
| Polynomial method | Algebraic identities | Polynomial degree | Limited to specific contexts | Bonferroni truncations |
Inclusion-exclusion is the natural choice when "how many objects avoid all of these bad properties?" is the question. The 2ⁿ subsets are a real cost, but for small n or when the intersections collapse to a simple shared form (as in derangements where every k-fold intersection has the same size), the sum collapses to a tidy single-index formula.
Where inclusion-exclusion shows up
- Database query planning — count distinct. Computing |A ∪ B ∪ C ∪ ...| for sets defined by SQL predicates uses inclusion-exclusion implicitly when the optimiser converts a UNION into separate scans plus correction terms. PostgreSQL's HyperLogLog cardinality estimator effectively applies it across hash buckets.
- Probability — birthday-style problems. The probability that 23 random people share at least one birthday is computed as 1 − P(all distinct). For more nuanced versions ("at least one pair shares AND at least one triple shares") inclusion-exclusion gives the exact answer where Poisson approximations only give bounds.
- Cryptography — RSA key generation. Computing φ(n) for an RSA modulus n = p × q reduces to (p − 1)(q − 1) by inclusion-exclusion. The 2048-bit RSA key generation in OpenSSL uses this directly when computing the private exponent d = e^(−1) mod φ(n).
- Network reliability — connectivity probability. The probability that a stochastic graph stays connected given independent edge failures is computed as a sum over spanning subgraphs with inclusion-exclusion corrections. Network-Reliability software like SHARPE applies it for graphs up to ~30 edges before exponential cost becomes prohibitive.
- Survey sampling — multi-frame estimation. When two overlapping survey panels (say, landline and cell-phone respondents) both contain some respondents, inclusion-exclusion adjusts the population estimate: total = panel A + panel B − overlap. The Pew Research Center publishes guidance on the multi-frame correction in dual-frame phone surveys.
Variants and extensions
- Bonferroni inequalities. Truncating the alternating sum at odd partial sums gives an upper bound, at even partial sums a lower bound. Useful when n is too large for the full sum: |∪A_i| ≤ Σ|A_i|, |∪A_i| ≥ Σ|A_i| − Σ|A_i ∩ A_j|, and so on. These bounds are tight when intersections of higher order are small.
- Möbius inversion on posets. Inclusion-exclusion is the special case of Möbius inversion on the boolean lattice of subsets. Generalising to other posets (divisor lattice, partition lattice) gives identities like Σ_{d|n} μ(d) × f(n/d), where μ is the number-theoretic Möbius function.
- Probabilistic inclusion-exclusion. P(A₁ ∪ ... ∪ A_n) = Σ (−1)^(|S|+1) × P(∩A_S). For events that are pairwise independent but not jointly so, the formula still holds — independence is a property of intersections, not of unions.
- Stirling-number identity. The relation S(n, k) = (1/k!) × Σ (−1)^j × C(k, j) × (k − j)^n is an inclusion-exclusion derivation of Stirling numbers of the second kind from surjection counts, generalising to any "onto" counting problem.
- Permanent expansion (Ryser's formula). The permanent of an n × n matrix can be computed as an inclusion-exclusion sum over 2ⁿ subsets — Ryser's formula is the fastest known general permanent algorithm at O(n × 2ⁿ).
Common pitfalls
- Sign errors. The classic mistake: writing |A ∪ B ∪ C| = |A| + |B| + |C| − |A ∩ B ∩ C|. The pairwise intersections must be subtracted before the triple intersection is added back. Always write all 2ⁿ − 1 terms before simplifying.
- Forgetting the empty set. Inclusion-exclusion produces |∪A_i|, not the size of the universe. To count "elements not in any A_i", subtract from the universe: |U \ ∪A_i| = |U| − |∪A_i|. The derangement derivation uses this subtraction.
- Wrong index range in the formula. The sum runs over non-empty subsets, S ≠ ∅. The empty intersection (over no sets) is the universe itself, not zero, but it doesn't appear in this version of the formula. Some textbooks state a complementary version using 1{x ∈ ∩A_S}, where the empty subset contributes the constant 1.
- Confusing union and intersection cardinalities. The k-fold terms involve intersections, not unions. |A ∩ B ∩ C| ≤ min(|A|, |B|, |C|), not max. Drawing a Venn diagram on the first attempt forces the indices straight.
- Computing all 2ⁿ terms when only a few are non-zero. If many of the |∩A_S| are zero (because some sets are disjoint), the sum collapses to a much smaller number of non-trivial terms. Sieve methods like Mertens' formula in analytic number theory exploit this aggressively.
Frequently asked questions
Why does the alternating sum work?
Consider any element x in the union and ask how often it gets counted. If x lies in exactly k of the sets A₁, ..., A_n, then in the formula it is added in C(k, 1) singletons, subtracted in C(k, 2) pairs, added in C(k, 3) triples, and so on. The total contribution is Σ (−1)^(j+1) C(k, j) for j = 1..k, which simplifies to 1 − (1 − 1)^k = 1 by the binomial theorem. So every element is counted exactly once.
What is the three-set form of the principle?
|A ∪ B ∪ C| = |A| + |B| + |C| − |A ∩ B| − |A ∩ C| − |B ∩ C| + |A ∩ B ∩ C|. The pattern continues for n sets: alternating signs, all 2ⁿ − 1 non-empty subsets summed, with the sign of an intersection of k sets being (−1)^(k+1).
How do derangements come out of inclusion-exclusion?
A derangement is a permutation with no fixed points. Let A_i be the set of permutations that fix element i. Permutations with at least one fixed point form |A_1 ∪ ... ∪ A_n|, so the derangement count is n! − |A_1 ∪ ... ∪ A_n|. Each |∩A_i| over a k-subset is (n−k)! because k positions are pinned and the rest can permute freely. Inclusion-exclusion then gives D_n = n! Σ (−1)^k / k! ≈ n!/e.
How is Euler's totient derived from inclusion-exclusion?
φ(n) counts integers in [1, n] coprime to n. If n has distinct prime divisors p₁, ..., p_k, let A_i be the multiples of p_i in [1, n]. We want to remove their union from {1, ..., n}. Inclusion-exclusion gives n − Σ n/p_i + Σ n/(p_i p_j) − ... = n × ∏ (1 − 1/p_i). For n = 12 = 2² × 3, φ(12) = 12 × (1 − 1/2)(1 − 1/3) = 12 × 1/2 × 2/3 = 4 (the integers 1, 5, 7, 11).
Does inclusion-exclusion work for probabilities?
Yes — the same identity holds for probability measures: P(A₁ ∪ ... ∪ A_n) = Σ P(A_i) − Σ P(A_i ∩ A_j) + ... ± P(A_1 ∩ ... ∩ A_n). For independent events, intersection probabilities factor as products, which gives the standard formula P(A_1 ∪ A_2) = P(A_1) + P(A_2) − P(A_1)P(A_2).
What is the Bonferroni inequality?
Truncating the inclusion-exclusion sum at any odd index gives an upper bound on |∪A_i|, and truncating at any even index gives a lower bound. So |∪A_i| ≤ Σ |A_i| (the union bound), |∪A_i| ≥ Σ |A_i| − Σ |A_i ∩ A_j|, and so on. These Bonferroni inequalities are the workhorses of probability when the full inclusion-exclusion sum is computationally infeasible.