Type Systems

Algebraic Data Types

Build types from "and" and "or" — then watch the illegal states disappear

Algebraic data types build new types by combining "and" (products: structs and tuples) and "or" (sums: tagged unions), where the total number of values multiplies for products and adds for sums — letting you make illegal states unrepresentable.

  • Two combinatorsproduct (×) and sum (+)
  • Product cardinality|A| × |B|
  • Sum cardinality|A| + |B|
  • Match safetyexhaustiveness at compile time
  • First appearedHope / ML, late 1970s

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

How algebraic data types work

Almost every type you build is one of two shapes glued together. The first shape is "this and that": a point has an x and a y, a user has a name and an age. The second shape is "this or that": a payment is cash or card or credit, a network request is loading or failed or done. Algebraic data types are just a precise name for those two combinators and the rule that you can nest them arbitrarily.

The "and" combinator is the product type — a tuple, struct, or record. The "or" combinator is the sum type — a tagged union, also called a discriminated union, a variant, or (in Rust) an enum. That's the whole vocabulary. A product holds one value from each of its fields simultaneously; a sum holds exactly one value from exactly one of its variants, plus a tag saying which.

The word "algebraic" is not metaphor. Count the number of distinct values each type can hold — its cardinality — and the combinators turn into arithmetic. If Bool has 2 values and a die face has 6, then the product (Bool, Die) has exactly 2 × 6 = 12 values, and the sum Bool | Die has 2 + 6 = 8. Products multiply, sums add. Everything else — making invalid states unrepresentable, exhaustive pattern matching, generic deriving — falls out of that one observation.

The algebra: counting inhabitants

Write |T| for the number of values of type T. The correspondence to ordinary algebra is exact:

|(A, B)|      = |A| × |B|      product   — "A and B"
|A | B|       = |A| + |B|      sum       — "A or B"
|Unit| = |()| = 1              the number 1 (one value, written () )
|Void|        = 0              the number 0 (no values, uninhabited)
|A → B|       = |B| ^ |A|      functions  — exponentiation

These satisfy the same laws as numbers. (A, Unit) is isomorphic to A because |A| × 1 = |A| — pairing a value with the single unit value adds no information. (A, Void) is uninhabited because |A| × 0 = 0 — you can never produce the second component, so you can never build the pair. The distributive law holds too: A × (B + C) is isomorphic to (A × B) + (A × C), which is exactly why a record-with-a-tagged-field can always be refactored into a tagged-union-of-records.

A familiar one: Option<A> (Rust) / Maybe a (Haskell) is None | Some(A), with cardinality 1 + |A|. That single extra inhabitant is the principled replacement for null — the "billion-dollar mistake" Tony Hoare named in 2009. The difference is that Option is a distinct type, so the compiler forces you to unwrap it before use, while null silently inhabits every reference type.

When to reach for an ADT

  • Mutually exclusive states. Anything that is "exactly one of N things" — a parser token, a UI loading state, a protocol message, a JSON value, an expression node in a compiler.
  • Modeling a domain precisely. When you can enumerate the legal shapes of your data, a sum type gives you exactly those shapes and nothing else.
  • Replacing flag soup. Three booleans that are only ever valid in three of their eight combinations should be one three-variant sum.
  • Recursive structures. Trees, linked lists, ASTs, nested config — a sum type with a self-referential variant is the canonical encoding.

When not to: if your variants share most of their fields and differ only in a behavior, open polymorphism (interfaces, traits, the visitor pattern) is more extensible — adding a new case to a sum forces you to edit every match site, whereas adding a class to an interface hierarchy doesn't. This is the classic expression problem trade-off: ADTs make adding operations cheap and adding variants expensive; OO subclassing makes the reverse cheap. Pick the axis you expect to grow.

ADTs vs other ways to model variation

Sum type (ADT)Inheritance / subclassEnum + structUntagged union (C)Map / dictionary
Illegal states representable?No — only declared shapesYes (base fields can be misused)Yes (tag can mismatch payload)Yes (no tag at all)Yes (any key/value)
Exhaustiveness checked?Yes, at compile timeNo (default branch silently swallows)NoNoNo
Add a new variantEdit every match (compiler lists them)Cheap — just subclassEdit every switch by handEdit every switch by handFree, unchecked
Add a new operationOne new functionEdit every subclassOne new switchOne new switchOne new function
Memory layoutTag + largest variant (packed)vtable pointer + fieldsTag + structLargest member, no tagHeap nodes + hashing
Runtime tag costRead 1 discriminant byte/wordVirtual dispatch (indirect call)Read tag fieldNone (and unsafe)Hash + compare
Best forClosed, known set of casesOpen, extensible set of casesC-style code by conventionBit-level interop, performanceDynamic, open-ended keys

The headline difference is the first row. An untagged C union stores the bytes but not which interpretation is live, so reading the wrong member is undefined behavior. A "tag plus union" struct fixes the missing tag but lets you set the tag to CIRCLE while the payload holds a rectangle. A true sum type bundles the tag and payload so tightly that the mismatch is unconstructible — the type system is the only one keeping the books.

What the discipline actually buys you

  • Eight states collapse to three. The textbook async state — { isLoading: bool, error: Error?, data: T? } — has 2 × 2 × 2 = 8 combinations. Only 3 are meaningful (loading; failed; succeeded). The other 5 (e.g. "loading and already has data and an error") are bugs waiting to happen. A sum type with 3 variants has exactly 3 inhabited shapes, deleting 5 classes of bug by construction.
  • Null-pointer exceptions go to zero in the type-checked path. Languages that replaced nullable references with Option/Maybe (Rust, Haskell, OCaml, F#, Swift's optionals) cannot throw a null dereference unless you explicitly opt into an unsafe unwrap. The cardinality went from "|A| + 1 hidden in every reference" to "+ 1 visible only where you wrote Option."
  • Adding a variant is a guided refactor, not a hunt. Add a fourth payment method to a sum and the exhaustiveness checker prints every match that no longer covers all cases — typically a handful of compile errors instead of a runtime crash discovered in production weeks later.
  • The tag is nearly free. A Rust enum is laid out as one discriminant plus the largest variant's storage, and niche optimization can even fold the tag into unused bit patterns: Option<&T> is the same 8 bytes as &T because the null pointer encodes None — zero space overhead for the safety.

JavaScript / TypeScript implementation

JavaScript has no native sum types, but TypeScript's discriminated unions get you exhaustiveness checking through a shared literal tag field. The product type is just an object.

// PRODUCT TYPE — "and": a point has an x AND a y
type Point = { x: number; y: number };   // |Point| = |number| × |number|

// SUM TYPE — "or": a shape is a circle OR a rectangle OR a triangle
type Shape =
  | { tag: "circle";    radius: number }
  | { tag: "rect";      w: number; h: number }
  | { tag: "triangle";  base: number; height: number };

// Pattern match by switching on the discriminant.
function area(s: Shape): number {
  switch (s.tag) {
    case "circle":   return Math.PI * s.radius ** 2;
    case "rect":     return s.w * s.h;
    case "triangle": return 0.5 * s.base * s.height;
    default:
      // Exhaustiveness: if a 4th variant is added and not handled,
      // `s` is NOT `never` here and this line fails to compile.
      const _exhaustive: never = s;
      return _exhaustive;
  }
}

// MAKING ILLEGAL STATES UNREPRESENTABLE
// Bad: 8 combinations, 5 of them nonsense.
type BadState = { loading: boolean; error?: string; data?: User };
// Good: exactly 3 inhabited shapes.
type RequestState =
  | { tag: "loading" }
  | { tag: "failure"; error: string }
  | { tag: "success"; data: User };

The const _exhaustive: never = s trick is the idiom for exhaustiveness in TypeScript: inside an exhaustive switch the compiler narrows s down to never, so the assignment type-checks. Add a variant and forget a case, and s is no longer never, turning the omission into a compile error.

Haskell, Rust, and Python

In languages with first-class ADTs the declaration is one line and the exhaustiveness check is built in. Here is the recursive list — the classic example where the cardinality equation becomes recursive — in three idioms.

-- HASKELL: sums use |, products use juxtaposition.
data Shape = Circle Double | Rect Double Double | Triangle Double Double

area :: Shape -> Double
area (Circle r)     = pi * r * r
area (Rect w h)     = w * h
area (Triangle b h) = 0.5 * b * h     -- omit a case → -Wincomplete-patterns warns

-- RECURSIVE ADT: |List a| = 1 + |a| × |List a|
data List a = Nil | Cons a (List a)
// RUST: `enum` is a sum type; struct variants carry products.
enum Shape {
    Circle { radius: f64 },
    Rect { w: f64, h: f64 },
    Triangle { base: f64, height: f64 },
}

fn area(s: &Shape) -> f64 {
    match s {                                  // match must be exhaustive — enforced
        Shape::Circle { radius } => std::f64::consts::PI * radius * radius,
        Shape::Rect { w, h } => w * h,
        Shape::Triangle { base, height } => 0.5 * base * height,
    }
}
# PYTHON 3.10+: emulate with a tagged class hierarchy + structural match.
from dataclasses import dataclass

@dataclass
class Circle:   radius: float
@dataclass
class Rect:     w: float; h: float
@dataclass
class Triangle: base: float; height: float

Shape = Circle | Rect | Triangle          # typing.Union via the | operator

def area(s: Shape) -> float:
    match s:
        case Circle(r):          return 3.14159265 * r * r
        case Rect(w, h):         return w * h
        case Triangle(b, h):     return 0.5 * b * h
    # A type checker like mypy/pyright reports the match as non-exhaustive
    # if a 4th class is added to the union and not handled here.

Note the same shape in all three: one type, several variants, and a match that the tooling can verify covers them all. Python's check is the weakest — it relies on an external static checker (mypy or pyright) rather than the runtime — but the modeling intent is identical.

Variants and refinements worth knowing

Generalized ADTs (GADTs). Ordinary ADTs give every constructor the same result type. A GADT lets each constructor refine the type parameter — e.g. an expression type where IntLit produces Expr Int and BoolLit produces Expr Bool — so a well-typed interpreter is impossible to write incorrectly. Available in Haskell and OCaml; Rust has no true GADTs, though type-state patterns built on traits and generics cover some of the same ground.

Polymorphic / open variants. OCaml's polymorphic variants and Rust's trait objects relax the "closed set" assumption, trading exhaustiveness for the ability to extend the set of cases later without editing existing code — one answer to the expression problem.

Newtypes and the unit type. A single-constructor, single-field sum is a newtype — same runtime representation as the wrapped value, but a distinct type, so you can't pass a UserId where an OrderId is expected. The empty sum, with zero constructors, is the uninhabited type (Rust's !, Haskell's Void) — useful as the error type of an operation that can never fail.

Records vs positional products. A product can label its fields (a record, { x, y }) or leave them positional (a tuple, (x, y)). Same cardinality, different ergonomics — records resist the "swapped two arguments of the same type" bug.

Smart constructors. ADTs make declared shapes legal, but can't enforce invariants like "this string is a valid email." The fix is to hide the constructor and expose a function that returns Option<Email>, so the only way to obtain an Email is through validation — pushing illegal states out at the boundary.

Common bugs and edge cases

  • Using a default/wildcard branch that hides new variants. Writing default: return 0 silences the exhaustiveness checker — the very feature you wanted. Match each case explicitly so adding a variant breaks the build.
  • Modeling a sum as a product of optionals. The flag-soup anti-pattern: { isLoading, error?, data? }. If the fields are mutually exclusive, it's a sum, not a product — convert it.
  • Forgetting the tag in hand-rolled unions. A bare C union with no discriminant is the source of countless type-confusion CVEs; reading the wrong member is undefined behavior.
  • Confusing cardinality 0 with cardinality 1. Void (0 values) is not Unit (1 value, written ()). A function returning Void can never return at all; one returning Unit returns the single trivial value.
  • Deeply nested recursive ADTs and stack depth. A naïve recursive traversal of a million-node tree-shaped ADT can blow the call stack; use an explicit work-list or a tail-recursive fold.
  • Over-using ADTs where the set is genuinely open. If new variants arrive constantly (a plugin system, a growing set of UI widgets), the "edit every match" cost dominates; an interface/trait is the better fit.

Frequently asked questions

Why are they called "algebraic" data types?

Because the number of values a type can hold follows the algebra of multiplication and addition. A product type (a struct of an A and a B) has |A| × |B| values; a sum type (an A or a B) has |A| + |B| values. Composing types is literally doing arithmetic on their cardinalities.

What's the difference between a sum type and a tagged union?

They are the same thing under different names. "Sum type" is the type-theory term; "tagged union" or "discriminated union" is the implementation term — a union plus a tag that records which variant is currently inhabited. TypeScript calls them discriminated unions, Rust calls them enums, Haskell and ML just call them data types.

How do ADTs make illegal states unrepresentable?

Instead of a struct with a boolean isLoading, a nullable error, and a nullable data field — which has 2 × 2 × 2 = 8 combinations, most of them nonsensical — you model the state as a sum type Loading | Failure(error) | Success(data) with exactly 3 inhabited shapes. The impossible combinations simply cannot be constructed, so you never have to check for them at runtime.

What is exhaustiveness checking?

When you pattern-match on a sum type, a compiler that knows all the variants can prove your match covers every case. If you add a fourth variant later and forget to handle it, the match fails to compile. Haskell, Rust, OCaml, F#, Swift, and TypeScript (with strict settings) all do this — it turns a future runtime bug into a compile error.

Do algebraic data types exist in languages without them, like Java or Python?

You can emulate them. Products are just classes or tuples. Sums are encoded with sealed class hierarchies (Java 17+ sealed interfaces, Kotlin sealed classes, Python's typing.Union plus match statements, Scala case classes). The emulation lacks the conciseness of a one-line declaration, and exhaustiveness checking is weaker, but the modeling discipline still pays off.

What is a recursive algebraic data type?

A type whose definition refers to itself, like List a = Nil | Cons a (List a) or Tree a = Leaf | Node (Tree a) a (Tree a). The cardinality equation becomes recursive — solving |List a| = 1 + |a| × |List a| gives a geometric series — which is why ADTs are the natural way to express trees, JSON, abstract syntax trees, and other nested structures.