Algorithms

Edit Distance (Levenshtein)

The fewest keystrokes that turn one word into another

Edit distance, or Levenshtein distance, is the minimum number of single-character insertions, deletions, and substitutions needed to turn one string into another — computed in O(mn) time with a dynamic-programming table.

TimeO(m·n)
Full-table spaceO(m·n)
Distance-only spaceO(min(m,n))
Operationsinsert · delete · substitute
IntroducedLevenshtein, 1965

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

What edit distance measures

Type "recieve" and your spell-checker suggests "receive." How did it know? It measured the edit distance — the minimum number of single-character edits that turn one string into the other. For "recieve" → "receive" that's just two edits: delete the i, insert it one position later. Among the thousands of dictionary words, "receive" is one of the closest by this metric, so it wins the suggestion.

The metric was defined by the Soviet mathematician Vladimir Levenshtein in a 1965 paper on error-correcting codes, which is why "edit distance" and "Levenshtein distance" are used interchangeably. Three operations are allowed, each costing 1:

Insertion — add one character (cat → cart).
Deletion — remove one character (cart → cat).
Substitution — replace one character with another (cat → cut).

The edit distance between two strings is the length of the cheapest sequence of these operations that transforms the first into the second. Because insert and delete are exact inverses, the metric is symmetric: dist(a, b) = dist(b, a). It's also a true mathematical metric — it satisfies the triangle inequality — which is what lets you index dictionaries with metric trees later on.

The recurrence and the DP table

The key insight is that edit distance has optimal substructure: the cheapest way to align two strings is built from the cheapest ways to align their prefixes. Let D[i][j] be the edit distance between the first i characters of string a and the first j characters of string b. Then:

D[0][j] = j                      (turn "" into b[:j] with j inserts)
D[i][0] = i                      (turn a[:i] into "" with i deletes)

D[i][j] = D[i-1][j-1]            if a[i-1] == b[j-1]   (free match)

        = 1 + min( D[i-1][j],    delete a[i-1]
                   D[i][j-1],    insert b[j-1]
                   D[i-1][j-1] ) substitute a[i-1] → b[j-1]
          otherwise

Read the three candidates geometrically. In a grid where a labels the rows and b labels the columns, the cell above (D[i-1][j]) is a deletion, the cell to the left (D[i][j-1]) is an insertion, and the cell diagonally up-left (D[i-1][j-1]) is either a free match or a paid substitution. You fill the table top-left to bottom-right; the answer sits in the bottom-right corner D[m][n].

This bottom-up table method is called the Wagner–Fischer algorithm, published by Robert Wagner and Michael Fischer in 1974. There are (m+1)(n+1) cells and each costs O(1), so the running time is O(mn) and the naive space is O(mn).

A worked example: "kitten" → "sitting"

The textbook example is the distance from kitten to sitting, which is 3. The cheapest edit script is:

kitten → sitten — substitute k → s
sitten → sittin — substitute e → i
sittin → sitting — insert g at the end

Here is the filled DP table. Rows are the prefixes of kitten, columns the prefixes of sitting. The first row and column are the base cases (0,1,2,3…). The bottom-right cell is the answer.

        ""  s   i   t   t   i   n   g
    ""   0   1   2   3   4   5   6   7
    k    1   1   2   3   4   5   6   7
    i    2   2   1   2   3   4   5   6
    t    3   3   2   1   2   3   4   5
    t    4   4   3   2   1   2   3   4
    e    5   5   4   3   2   2   3   4
    n    6   6   5   4   3   3   2   3   ← D[6][7] = 3

Trace the diagonal of cheap matches: i, t, t line up perfectly (the run of 1s sliding down the diagonal), so they're free. The cost accrues only at the k/s mismatch, the e/i mismatch, and the trailing g insertion — three edits total.

When to use edit distance

Spelling correction and autocomplete — rank dictionary candidates by closeness to the typed word.
Fuzzy search and record linkage — match "Jon Smith" to "John Smyth" across messy databases.
Bioinformatics — DNA and protein alignment is edit distance with custom costs (the Needleman–Wunsch and Smith–Waterman algorithms are weighted variants).
Diff and version control — line-level edit distance underlies diff, git, and patch generation (the LCS variant, specifically).
Plagiarism and near-duplicate detection — short edit distance between documents flags copying.
Speech and OCR post-processing — measure word error rate against a reference transcript.

Reach for something else when the full quadratic cost is unaffordable. For very long strings where you only care whether the distance is small, banded DP (Ukkonen) or bit-parallel methods (Myers) are far faster. For huge dictionaries, a Levenshtein automaton or SymSpell index avoids per-word DP entirely.

Edit distance vs related string metrics

	Levenshtein	Damerau–Levenshtein	Hamming	LCS (diff)	Jaro–Winkler
Operations	insert, delete, substitute	+ adjacent transposition	substitute only	insert, delete (no sub)	matches + transpositions, prefix bonus
Different lengths?	Yes	Yes	No — equal length only	Yes	Yes
Time	O(mn)	O(mn)	O(n)	O(mn)	O(mn)
True metric (triangle inequality)?	Yes	Restricted variant: no; full variant: yes	Yes	Yes (as a distance)	No (a similarity score)
"teh" → "the"	2	1	2	2	high similarity
Typical use	spell-check, fuzzy match	typo correction	error-correcting codes, fixed-width IDs	file diff, version control	name and address matching

The headline distinction: Hamming is the cheap special case when both strings are the same length and you never need to shift characters. Levenshtein generalizes it to arbitrary lengths at the cost of the quadratic DP. LCS (longest common subsequence) is Levenshtein with substitution forbidden, which is exactly what diff wants — it only inserts and deletes whole lines.

What the numbers actually say

Comparing two 1,000-character strings is one million cell updates — a few milliseconds in C, tens of milliseconds in pure Python. Comparing two 100,000-character genomes naively is 10¹⁰ cells, minutes of work and 10 GB of table if you store it all.
The two-row trick cuts memory by a factor of m. For a 100,000 × 100,000 alignment, a full int32 table is ~40 GB; two rows are ~800 KB. That single change is the difference between "won't fit in RAM" and "fits in L2 cache adjacent."
Banded DP turns O(mn) into O(n·k) when you only care about distances ≤ k. For a spell-checker with k = 2 against a 10-character word, that's ~50 cells (a band of width 2k+1 = 5) instead of the full ~120-cell table for a 12-letter dictionary entry — and the savings grow with word length: the band stays five cells wide, so at a hundred characters it covers ~500 cells versus ~10,000 for the full table, a 20× cut.
Spell-checking a typo against 300,000 dictionary words via brute-force DP is ~10⁸ cell updates per query. A SymSpell deletion index answers the same query in roughly 1 microsecond by reducing it to hash lookups — about a 10⁵× speedup.
Myers' bit-parallel algorithm packs one DP row into machine words and runs in O(mn / w) where w is the word size (64), giving a ~64× constant-factor speedup on patterns up to 64 characters — this is what powers fast approximate grep.

JavaScript implementation

The space-optimized two-row version. It only returns the distance — if you also need the alignment, keep the full table (commented variant below).

function editDistance(a, b) {
  const m = a.length, n = b.length;
  // Ensure the inner (rolling) dimension is the smaller one for less memory.
  if (n > m) return editDistance(b, a);

  let prev = new Array(n + 1);
  let curr = new Array(n + 1);
  for (let j = 0; j <= n; j++) prev[j] = j;   // base row: "" → b[:j]

  for (let i = 1; i <= m; i++) {
    curr[0] = i;                                // base col: a[:i] → ""
    for (let j = 1; j <= n; j++) {
      const cost = a[i - 1] === b[j - 1] ? 0 : 1;
      curr[j] = Math.min(
        prev[j] + 1,        // delete a[i-1]
        curr[j - 1] + 1,    // insert b[j-1]
        prev[j - 1] + cost  // match or substitute
      );
    }
    [prev, curr] = [curr, prev];                // swap rows, reuse the array
  }
  return prev[n];
}

editDistance("kitten", "sitting");  // → 3
editDistance("recieve", "receive"); // → 2

Two details worth flagging. First, swapping the row arrays with destructuring ([prev, curr] = [curr, prev]) reuses both buffers and avoids reallocating n+1 entries every row. Second, the early if (n > m) swap guarantees the rolling array is the shorter dimension, so memory is O(min(m,n)) regardless of argument order.

Python implementation

The full-table version, plus backtracking to recover the actual edit operations — the part learners most often get wrong.

def edit_distance(a: str, b: str) -> int:
    m, n = len(a), len(b)
    D = [[0] * (n + 1) for _ in range(m + 1)]
    for i in range(m + 1): D[i][0] = i
    for j in range(n + 1): D[0][j] = j
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            cost = 0 if a[i - 1] == b[j - 1] else 1
            D[i][j] = min(
                D[i - 1][j] + 1,        # delete
                D[i][j - 1] + 1,        # insert
                D[i - 1][j - 1] + cost  # match / substitute
            )
    return D[m][n]


def edit_script(a: str, b: str):
    """Return the distance and the list of operations, via backtracking."""
    m, n = len(a), len(b)
    D = [[0] * (n + 1) for _ in range(m + 1)]
    for i in range(m + 1): D[i][0] = i
    for j in range(n + 1): D[0][j] = j
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            cost = 0 if a[i - 1] == b[j - 1] else 1
            D[i][j] = min(D[i-1][j] + 1, D[i][j-1] + 1, D[i-1][j-1] + cost)

    ops, i, j = [], m, n
    while i > 0 or j > 0:
        if i > 0 and j > 0 and a[i-1] == b[j-1] and D[i][j] == D[i-1][j-1]:
            i, j = i - 1, j - 1                         # match (free)
        elif i > 0 and j > 0 and D[i][j] == D[i-1][j-1] + 1:
            ops.append(f"sub {a[i-1]}→{b[j-1]}"); i, j = i-1, j-1
        elif i > 0 and D[i][j] == D[i-1][j] + 1:
            ops.append(f"del {a[i-1]}"); i -= 1
        else:
            ops.append(f"ins {b[j-1]}"); j -= 1
    ops.reverse()
    return D[m][n], ops


edit_distance("kitten", "sitting")  # → 3
edit_script("kitten", "sitting")    # → (3, ['sub k→s', 'sub e→i', 'ins g'])

Backtracking walks from D[m][n] back to D[0][0], at each step asking "which neighbor did this cell come from?" Crucially you must check the free match case first — if the characters are equal and the diagonal cost matches, you took the free diagonal, not a paid substitution. Mixing that ordering up produces a valid distance but a nonsensical edit script.

Variants worth knowing

Damerau–Levenshtein. Adds adjacent transposition as a fourth unit-cost operation, so teh → the is one edit. It models the single most common typing mistake. The "restricted" (optimal string alignment) version edits each substring at most once and is not a true metric; the full version is, but costs more to compute.

Weighted / custom-cost edit distance. Let each operation carry its own cost — a substitution between adjacent QWERTY keys might cost less than between distant keys. Bioinformatics takes this furthest: Needleman–Wunsch (global) and Smith–Waterman (local) alignment use a substitution matrix (BLOSUM, PAM) and affine gap penalties so a long indel costs less than many separate ones.

Longest common subsequence (LCS). Forbid substitution and only the diagonal-match / insert / delete moves remain. The complement of LCS length is the insert-delete-only edit distance, which is exactly the model behind line-based diff.

Hirschberg's algorithm. Recovers the full alignment in O(min(m,n)) space (not just the distance) by divide-and-conquer: split string b in half, find the optimal crossing point using forward and backward two-row passes, then recurse on the two halves. Same O(mn) time, linear space.

Ukkonen's banded DP and Myers' bit-vector algorithm. When you only need distances up to a threshold k, you can restrict computation to a diagonal band of width 2k+1, giving O(nk). Myers packs a whole DP row into bit-vectors and updates it with a handful of word operations, hitting O(mn/w) — the foundation of fast approximate matching tools like agrep.

Common bugs and edge cases

Off-by-one in indexing. The table is 1-indexed by prefix length, but the strings are 0-indexed, so cell D[i][j] compares a[i-1] with b[j-1]. Dropping the -1 silently compares the wrong characters.
Forgetting to initialize the first row and column. If D[i][0] = i and D[0][j] = j aren't set, every distance is wrong because the base case (turning a string into the empty string) is missing.
Overwriting the diagonal in the one-row optimization. If you collapse to a single row, you must save the previous diagonal value before overwriting curr[j], or you'll read the already-updated cell and corrupt the substitution candidate.
Counting transpositions as one edit with plain Levenshtein. Plain Levenshtein scores ab → ba as 2 (delete + insert or two subs). If you expect 1, you want Damerau–Levenshtein — different recurrence.
Unicode and grapheme clusters. Iterating by UTF-16 code unit splits emoji and combining characters into pieces, inflating the distance. Normalize (NFC) and compare by code point or grapheme cluster, not raw bytes.
Brute-forcing a whole dictionary. Running O(mn) DP against every word for autocomplete is the classic performance trap. Bound the distance and use a trie or Levenshtein automaton — full DP per word doesn't scale past a few thousand entries.

Frequently asked questions

What is the difference between Levenshtein distance and Hamming distance?

Hamming distance only counts substitutions and requires both strings to be the same length — it answers "how many positions differ?". Levenshtein adds insertions and deletions, so it can compare strings of different lengths and align them by shifting characters. Hamming("karolin", "kathrin") is 3; Levenshtein is also 3 here, but for "cat" vs "cats" Hamming is undefined while Levenshtein is 1.

Why is edit distance O(mn)?

The dynamic-programming table has (m+1) × (n+1) cells, one per prefix pair. Each cell is filled in O(1) time by taking the minimum of three already-computed neighbors. So the total work is proportional to the number of cells, m·n. No known algorithm beats this by more than logarithmic factors in the worst case.

Can you compute edit distance in less than O(mn) space?

Yes — if you only need the distance number, keep just two rows (or one row plus a temp variable), giving O(min(m,n)) space. To also recover the actual alignment in linear space, use Hirschberg's algorithm, which combines divide-and-conquer with the two-row trick to reconstruct the edit script in O(min(m,n)) space and still O(mn) time.

What is Damerau-Levenshtein distance?

It adds a fourth operation: transposition of two adjacent characters, counted as a single edit. This models the most common human typo — "teh" → "the" is one Damerau-Levenshtein edit but two plain Levenshtein edits. It's the metric most spell-checkers actually use.

Is edit distance a true metric?

Yes. Levenshtein distance is non-negative, symmetric (insert and delete are duals), zero only for identical strings, and satisfies the triangle inequality. That lets you use it inside metric data structures like BK-trees and VP-trees for fast nearest-neighbor lookups over a dictionary.

How do spell-checkers use edit distance at scale?

They don't run full O(mn) DP against every dictionary word. Instead they bound the search — only words within edit distance 1 or 2 matter — and use a trie traversal pruned by a running DP row, or a precomputed structure like a Levenshtein automaton or SymSpell's deletion index, to find candidates in microseconds instead of scanning millions of words.