Question 1

How does a suffix automaton differ from a suffix tree?

Accepted Answer

A suffix tree is a compacted trie of all suffixes of s — a tree with O(n) leaves and internal nodes. A suffix automaton is a DAG of states; each state is an equivalence class of substrings sharing the same set of right-extension positions in s. The two structures are duals: SAM(s) is isomorphic in structure to a compacted form of the suffix tree of reverse(s), and vice versa. SAM has at most 2n − 1 states and 3n − 4 transitions versus the suffix tree's 2n − 1 nodes and 2n − 2 edges; the constants are similar but the SAM construction (Blumer's online algorithm) is significantly easier to implement than suffix-tree construction (Ukkonen, McCreight).

Question 2

What is a 'state' in SAM?

Accepted Answer

A state represents an equivalence class of substrings of s. Two substrings are equivalent if they have exactly the same set of ending positions in s — formally, the same 'endpos' set. For each state v we record three things: 'len(v)' the maximum length of a substring in its class; 'link(v)' the suffix-link pointer to the state of the longest proper suffix not in v's class; and 'next(v, c)' the transitions out. The number of distinct endpos sets is at most 2n − 1, which is why SAM has linear states.

Question 3

How is online construction done in linear time?

Accepted Answer

Blumer's online algorithm extends SAM(s) to SAM(s + c) in amortized O(σ) time per appended character. Maintain 'last' — the state corresponding to the entire current string. To add character c: create a new state cur with len = len(last) + 1. Walk up the suffix-link chain from last, adding transitions to cur for any state without a c-transition. When you find a state p with a c-transition to q: if len(q) = len(p) + 1, set link(cur) = q; else clone q into a new state q' with len = len(p) + 1, retarget p's c-transition to q', set link(q) = link(cur) = q', and continue up the chain redirecting any other p with c-transition to q. Total work across all insertions amortizes to O(n × σ); aggregate analysis using the suffix-link depths bounds it.

Question 4

How do you count distinct substrings using SAM?

Accepted Answer

The number of distinct substrings of s equals the sum over all non-initial states v of (len(v) − len(link(v))) — that is, the number of distinct substring lengths represented at v. This evaluates in O(n) over the SAM. Equivalently, each substring corresponds to a unique path of length equal to its length from the initial state, and counting paths gives the same total. For a string of length n the count can be as large as n(n + 1)/2, so storing all substrings explicitly would be O(n²); SAM gives O(n) representation and O(n) query.

Question 5

When is SAM preferable to suffix array?

Accepted Answer

SAM is online — you can append characters and update the automaton in amortized O(σ) per character. Suffix arrays require a full re-sort or DC3 reconstruction. SAM is preferable for streaming inputs (text being typed, log streams) and for problems that need substring queries during construction. SAM is also more natural for problems involving counts of substrings or distinct-substring enumeration. Suffix arrays win for problems that need lexicographic ordering of suffixes, range LCP queries with sparse tables, or pattern matching when memory layout matters more than online updates. Most competitive-programming problems can be solved either way; the choice often comes down to which the contestant has memorized.

Question 6

What's the link to Directed Acyclic Word Graphs (DAWGs)?

Accepted Answer

A Directed Acyclic Word Graph (DAWG, also called a Directed Acyclic Subsequence Graph for some variants) is the same structure as a suffix automaton but recognizing all substrings rather than only suffixes. The two terms are used interchangeably for SAM in much of the literature; Blumer's 1985 paper is titled 'The smallest automaton recognizing the subwords of a text' and used 'DAWG'. Modern competitive-programming references (e-maxx, CP-Algorithms) prefer 'suffix automaton'. The DAWG name persists in lexicon-compression libraries (e.g., MARISA-trie, dawg-rust) where the same machinery compresses dictionaries.

Suffix Automaton

Interactive visualization

Watch the 60-second explainer

Why suffix automaton matters

Common misconceptions

Construction in detail

Applications and idioms

Frequently asked questions