Combinatorial Algorithms

Dancing Links

Knuth's pointer trick — remove and restore in O(1)

Knuth's dancing links is a doubly linked list trick that removes and restores nodes in O(1) by preserving each node's neighbor pointers. The engine behind Algorithm X and the fastest Sudoku solvers.

  • Remove a nodeO(1) (2 pointer writes)
  • Restore a nodeO(1) (reverse the 2 writes)
  • Cover a columnO(k) where k = rows × cells
  • Space per 1-cell~48 bytes (6 pointers)
  • Best forSparse exact-cover matrices
  • OriginKnuth 2000, "Dancing Links"

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

How dancing links works

The setup: a sparse 0/1 matrix. Each 1 becomes a node with four pointers — left, right, up, down — linking it to neighboring 1s in the same row and column. Each row is a circular doubly linked list; each column is a circular doubly linked list. Plus a special header node per column tracking how many 1s remain.

The core operation is remove:

remove(node):
    node.left.right ← node.right
    node.right.left ← node.left
    // node.left and node.right are NOT changed

Two pointer writes. The neighbors now point past the node; from their perspective it's gone. But the node itself still points to its old neighbors. Crucially, the invariants node.left.right == node would be false after removal — except we explicitly didn't touch node.left or node.right, so we can detect and reverse the change:

restore(node):
    node.left.right ← node
    node.right.left ← node

Two more pointer writes, in reverse. Because node.left and node.right still hold their pre-removal values, restoration brings everything back to exactly the prior state. Total cost: 4 pointer writes for a remove + restore round trip, regardless of list length.

Covering a column

In exact-cover problems, you don't just remove single nodes — you remove an entire column (the constraint it represents is now satisfied) and all rows that intersect it (those rows can't be picked again because they would re-satisfy a satisfied constraint).

cover(col):
    // Detach the column header from the header row
    col.right.left ← col.left
    col.left.right ← col.right

    // For each row that had a 1 in this column...
    for row in column col, top to bottom (using down pointers):
        // ...remove every other 1 in that row from its column
        for n in row, left to right (using right pointers, excluding the row's column entry):
            n.up.down ← n.down
            n.down.up ← n.up
            n.column.size -= 1

Uncovering reverses everything in opposite order:

uncover(col):
    for row in column col, BOTTOM TO TOP (using up pointers):
        for n in row, RIGHT TO LEFT (using left pointers):
            n.column.size += 1
            n.up.down ← n
            n.down.up ← n
    col.right.left ← col
    col.left.right ← col

The reversed iteration order matters: it makes restoration symmetric to removal, so every pointer ends up exactly where it started.

Trace: cover and restore a tiny matrix

         C1  C2  C3  C4  C5
   R1:    1   0   1   0   0
   R2:    0   1   0   1   0
   R3:    1   0   0   0   1
   R4:    0   1   1   0   1

Choose R1. Cover columns C1 and C3.
  Covering C1:
    - Remove header C1 from header row.
    - Visit rows in C1 top-to-bottom: R1, R3.
    - Remove R1's other 1 (in C3) from C3's column → C3.size = 1
    - Remove R3's other 1 (in C5) from C5's column → C5.size = 1
  Covering C3:
    - Remove header C3 from header row.
    - Visit rows in C3 top-to-bottom: R4 (R1 already gone from C3 above).
    - Remove R4's other 1 (in C2) from C2's column → C2.size = 1
    - Remove R4's other 1 (in C5) from C5's column → C5.size = 0

  Active rows: just R2.  Active columns: C2, C4, C5 (but C5 empty → dead end).

Backtrack — uncover C3 then C1, in reverse order:
  Uncovering C3:
    - Visit R4 bottom-to-top (only R4 to visit).
    - Restore R4's 1 in C5 → C5.size = 1
    - Restore R4's 1 in C2 → C2.size = 2
    - Restore header C3.
  Uncovering C1:
    - Visit rows in C1 bottom-to-top: R3, R1.
    - Restore R3's 1 in C5 → C5.size = 2
    - Restore R1's 1 in C3 → C3.size = 2
    - Restore header C1.

State after uncover is byte-identical to state before cover.

Six pointer writes per remove + restore round trip, multiplied by the number of nodes touched. The wall-clock cost is dominated by memory access, not by structural updates — DLX gets near-bandwidth-bound speed on cache-resident matrices.

When to use dancing links

  • Exact cover problems with sparse incidence matrices. Sudoku, polyomino/pentomino tiling, n-queens (alternate formulation), Soma cube assembly, exact set cover, exact graph coloring with a small palette.
  • Constraint satisfaction with reversible state. Any backtracking search where you make a tentative commitment, recurse, and might need to undo. DLX makes the undo free.
  • Large search trees with high pruning rates. If you'd otherwise be copying the constraint state at each recursion level, DLX cuts the per-recursion cost by orders of magnitude.

If your matrix is dense (most cells are 1), bit vectors with popcount are simpler and faster. DLX shines when the average row has 4–10 ones in a column space of hundreds — Sudoku's exact-cover matrix has 729 candidate rows × 324 constraint columns with only 4 ones per row, a density of ~1.2%.

Dancing links vs alternatives

Dancing LinksBit-vector coverRecursive copyVersioned arraysTrailingRe-execute
Remove costO(1)O(n/64)O(n)O(1)O(1)
Restore costO(1)Restore from copyRestore from copySnapshot O(1), restore O(log n)O(k) per restoreO(n) replay
Allocations0 during search0O(depth) copiesO(depth) snapshots1 trail per change0
Cache localityBad (pointer-chasing)ExcellentOKBadDecentGood
Best forSparse exact coverDense cover, ≤64 elementsTiny searchBranchy backtrackConstraint solversTime travel debug
Realistic Sudoku time~5 ms~15 ms (dense pack)~500 ms~50 ms~30 msN/A

What "constant time" means in nanoseconds

A single cover or uncover operation touches roughly k nodes where k is the total cells in rows that overlap the chosen column. For Sudoku's matrix, that's ~30 nodes per cover. At ~50 ns per pointer dereference + write (uncached) or ~5 ns (cached), the structural cost per recursion level is microseconds:

ProblemMatrix sizeSearch nodesDLX timeNaïve backtrack
4×4 Sudoku64 × 64~50~50 µs~10 ms
9×9 Sudoku729 × 324~5,000~5 ms~500 ms
AI Escargot (hardest 9×9)729 × 324~200,000~30 ms~6 s
Pentomino 6×10 tiling~2,000 × 72~9 M~3 s~30 min
16×16 Sudoku4,096 × 1,024~100,000~100 ms~30 s

The 100–1000× speedup over naïve backtracking comes from never re-allocating constraint state at each recursion level. DLX makes a recursive call essentially free in terms of memory bandwidth.

JavaScript implementation

// DLX node — six pointers plus payload.
class DLXNode {
  constructor() {
    this.left = this.right = this.up = this.down = this;
    this.column = null;
    this.rowId = -1;
  }
}

class Column extends DLXNode {
  constructor(name) {
    super();
    this.name = name;
    this.size = 0;
    this.column = this;
  }
}

class DLX {
  constructor(columnNames) {
    this.root = new Column('root');
    this.columns = [];
    for (const name of columnNames) {
      const col = new Column(name);
      // Insert col into the header row, just left of root
      col.left = this.root.left;
      col.right = this.root;
      this.root.left.right = col;
      this.root.left = col;
      this.columns.push(col);
    }
  }

  addRow(rowId, columnIndices) {
    let first = null;
    for (const ci of columnIndices) {
      const col = this.columns[ci];
      const node = new DLXNode();
      node.column = col;
      node.rowId = rowId;
      // Insert at bottom of column
      node.up = col.up;
      node.down = col;
      col.up.down = node;
      col.up = node;
      col.size++;
      // Link into row
      if (first === null) { first = node; }
      else {
        node.left = first.left;
        node.right = first;
        first.left.right = node;
        first.left = node;
      }
    }
  }

  cover(col) {
    col.right.left = col.left;
    col.left.right = col.right;
    for (let row = col.down; row !== col; row = row.down) {
      for (let n = row.right; n !== row; n = n.right) {
        n.up.down = n.down;
        n.down.up = n.up;
        n.column.size--;
      }
    }
  }

  uncover(col) {
    for (let row = col.up; row !== col; row = row.up) {
      for (let n = row.left; n !== row; n = n.left) {
        n.column.size++;
        n.up.down = n;
        n.down.up = n;
      }
    }
    col.right.left = col;
    col.left.right = col;
  }

  search(solution = [], onSolution) {
    if (this.root.right === this.root) {
      onSolution(solution.slice());
      return;
    }
    // Choose column with smallest size (S heuristic)
    let col = this.root.right;
    for (let c = col.right; c !== this.root; c = c.right) {
      if (c.size < col.size) col = c;
    }
    this.cover(col);
    for (let row = col.down; row !== col; row = row.down) {
      solution.push(row.rowId);
      for (let n = row.right; n !== row; n = n.right) this.cover(n.column);
      this.search(solution, onSolution);
      for (let n = row.left; n !== row; n = n.left) this.uncover(n.column);
      solution.pop();
    }
    this.uncover(col);
  }
}

The 80 lines above are a complete DLX solver. Knuth's 2000 paper "Dancing Links" presents the same algorithm in similar size. Production solvers add MRV column ordering, watched-literal propagation, and parallelism, but the core structure is exactly this.

Famous problem: Sudoku as exact cover

A 9×9 Sudoku puzzle is an exact-cover instance with 4 constraint families:

  • Cell constraints — each of the 81 cells must contain one digit. (81 columns)
  • Row constraints — each row must contain each digit 1–9. (9 × 9 = 81 columns)
  • Column constraints — each column must contain each digit 1–9. (9 × 9 = 81 columns)
  • Box constraints — each 3×3 box must contain each digit 1–9. (9 × 9 = 81 columns)

Total: 324 constraint columns. Each candidate (row, column, digit) is a candidate row in the matrix; there are 9 × 9 × 9 = 729 candidate rows. Each candidate satisfies exactly 4 constraints (its cell, its row-digit, its col-digit, its box-digit), so each row has 4 ones — sparsity 1.2%.

function buildSudokuDLX(puzzle) {
  // puzzle: 9×9 grid, 0 for empty
  const colNames = [];
  for (let i = 0; i < 81; i++) colNames.push(`cell-${i}`);
  for (let r = 0; r < 9; r++) for (let d = 1; d <= 9; d++) colNames.push(`row${r}-d${d}`);
  for (let c = 0; c < 9; c++) for (let d = 1; d <= 9; d++) colNames.push(`col${c}-d${d}`);
  for (let b = 0; b < 9; b++) for (let d = 1; d <= 9; d++) colNames.push(`box${b}-d${d}`);

  const dlx = new DLX(colNames);
  let rowId = 0;
  for (let r = 0; r < 9; r++) {
    for (let c = 0; c < 9; c++) {
      const given = puzzle[r][c];
      for (let d = 1; d <= 9; d++) {
        if (given !== 0 && given !== d) continue;
        const b = Math.floor(r / 3) * 3 + Math.floor(c / 3);
        dlx.addRow(rowId++, [
          r * 9 + c,                    // cell
          81 + r * 9 + (d - 1),         // row-digit
          162 + c * 9 + (d - 1),        // col-digit
          243 + b * 9 + (d - 1),        // box-digit
        ]);
      }
    }
  }
  return dlx;
}

function solveSudoku(puzzle) {
  const dlx = buildSudokuDLX(puzzle);
  const grid = puzzle.map(row => [...row]);
  dlx.search([], solution => { /* decode solution rowIds → grid */ });
  return grid;
}

On the legendary "AI Escargot" puzzle — designed by Arto Inkala in 2006 as the world's hardest 9×9 Sudoku — DLX finds the unique solution in about 30 ms on a modern laptop. A purely recursive backtracker without dancing links takes 5–10 seconds on the same input.

Python implementation

# Knuth uses arrays of pointers, not objects, for cache locality and speed.
# The simpler object form mirrors the math directly.

class DLXNode:
    __slots__ = ('left', 'right', 'up', 'down', 'column', 'row_id', 'name', 'size')
    def __init__(self, name=None):
        self.left = self.right = self.up = self.down = self
        self.column = self
        self.row_id = -1
        self.name = name
        self.size = 0

class DLX:
    def __init__(self, columns):
        self.root = DLXNode('root')
        self.cols = []
        for name in columns:
            col = DLXNode(name)
            col.left = self.root.left
            col.right = self.root
            self.root.left.right = col
            self.root.left = col
            self.cols.append(col)

    def add_row(self, row_id, col_indices):
        first = None
        for ci in col_indices:
            col = self.cols[ci]
            node = DLXNode()
            node.column = col
            node.row_id = row_id
            node.up = col.up; node.down = col
            col.up.down = node; col.up = node
            col.size += 1
            if first is None:
                first = node
            else:
                node.left = first.left; node.right = first
                first.left.right = node; first.left = node

    @staticmethod
    def cover(col):
        col.right.left = col.left
        col.left.right = col.right
        row = col.down
        while row is not col:
            n = row.right
            while n is not row:
                n.up.down = n.down
                n.down.up = n.up
                n.column.size -= 1
                n = n.right
            row = row.down

    @staticmethod
    def uncover(col):
        row = col.up
        while row is not col:
            n = row.left
            while n is not row:
                n.column.size += 1
                n.up.down = n
                n.down.up = n
                n = n.left
            row = row.up
        col.right.left = col
        col.left.right = col

    def search(self, solution, on_solution):
        if self.root.right is self.root:
            on_solution(list(solution))
            return
        # Pick smallest-size column
        col = self.root.right
        c = col.right
        while c is not self.root:
            if c.size < col.size: col = c
            c = c.right
        self.cover(col)
        row = col.down
        while row is not col:
            solution.append(row.row_id)
            n = row.right
            while n is not row: self.cover(n.column); n = n.right
            self.search(solution, on_solution)
            n = row.left
            while n is not row: self.uncover(n.column); n = n.left
            solution.pop()
            row = row.down
        self.uncover(col)

Why "dancing"?

Knuth named it after the visual rhythm of the pointers during cover and uncover. Watch a row come out: left and right neighbors swing inward, the node "leaves the floor." Watch it go back in: neighbors swing outward as the node returns. With dozens of nodes covered and uncovered each step, the pointers fan out and snap back like dancers in a chorus line — hence the metaphor.

The 2000 paper "Dancing Links" introduces the term and gives the full Algorithm X presentation. Knuth had used the technique informally in earlier work, but the paper made it canonical for combinatorial search.

Common bugs and edge cases

  • Wrong restore order. Uncovering rows top-to-bottom (instead of bottom-to-top) corrupts pointers — the restoration order must mirror removal exactly. Same for left/right within a row.
  • Forgetting to update column size. Each remove and restore must adjust the column's size counter; otherwise the MRV heuristic picks wrong columns and search blows up.
  • Off-by-one in column header. The header node should never appear as a row in its own column. for (row = col.down; row !== col; row = row.down) handles this; while (row !== null) doesn't (circular lists have no null).
  • Allocating during search. If you accidentally create new nodes inside search(), you've defeated the point. Production DLX uses pre-allocated arrays of indices, not pointer chains, for cache locality and to make this bug impossible.
  • Memory blowup on dense matrices. 48 bytes per 1-cell means a 1000 × 1000 dense matrix uses 48 MB of pointers vs 125 KB for a packed bit matrix. Always check sparsity before reaching for DLX.
  • Solution recording. The solution list contains rowId values in the order they were selected — but a single exact-cover instance can have many solutions (Sudoku puzzles with multiple solutions are ill-formed). Decide upfront whether to find one, all, or count.

Frequently asked questions

Why does removing a node preserve its pointers?

Removal updates the neighbors — node.left.right = node.right and node.right.left = node.left — but doesn't touch node.left or node.right themselves. The removed node still points to its old neighbors. To restore, just write node back: node.left.right = node and node.right.left = node. The data was never destroyed, just bypassed.

Why is dancing links faster than re-allocating?

A naïve backtracking search would copy or re-create the data structure at each recursion level. DLX does zero allocation in the recursion — every remove is 2 pointer writes, every restore is 2 pointer writes. For Sudoku, that's the difference between a solver taking 10 ms (DLX) versus ~500 ms (recursive copy).

What problems fit dancing links?

Exact cover problems: given a set of constraints and a set of options where each option satisfies some constraints, find a subset of options that satisfies each constraint exactly once. Examples: Sudoku (each cell, row, column, box must contain each digit once), pentomino tiling, n-queens (each row, column, diagonal hit exactly once), polycube assembly, exact set cover.

Is the data structure always 2D?

Yes — every node sits in a row and a column, with left/right links along the row and up/down links along the column. Plus a header for each column tracking its size. The structure stays connected even as nodes are removed: removed nodes still point inward, the bypassed neighbors point around them.

Does dancing links use less memory than a bit matrix?

Only for sparse matrices. DLX stores 6 pointers per 1-cell (left, right, up, down, column header, row id). On 64-bit machines that's 48 bytes per 1. A bit matrix uses 1 bit per cell, so DLX is competitive only when fewer than 1/384 of cells are 1s. Sudoku's matrix is 729 × 324 with 2916 ones — sparsity ~1.2% — so DLX wins handily.

What's the relationship to Algorithm X?

Algorithm X is the recursive backtracking strategy: pick the column with fewest ones, try each row that covers it, cover the chosen row's columns, recurse, uncover on backtrack. Dancing Links is the data structure that makes Algorithm X fast — the cover/uncover operations are pure pointer manipulation. The two together are usually called DLX.