String Algorithms

Manacher's Algorithm

Q: Why insert separator characters?

To unify even and odd palindromes. Original 'abba' has an even palindrome with no center character. Transform to '^#a#b#b#a#$' (sentinels and separators). Now every palindrome — even or odd — has a single character center: 'abba' becomes the palindrome around center index 5 ('#'). After computing the radius array on the transformed string, the palindromic radius P[i] in the transformed coordinate divided by 2 gives the original-string radius, regardless of parity. Sentinels '^' and '$' (any character not in the alphabet) prevent boundary checks because they will never match any inner character.

Q: How is Manacher's different from suffix-tree approaches?

Suffix trees can answer palindrome queries by combining the tree of s with the tree of reverse(s) and looking for common substrings — Gusfield gives an O(n) construction this way. Manacher's algorithm is direct: it never builds a tree, uses one pass and O(n) extra memory. Constants are an order of magnitude smaller; on a 100 MB input the suffix-tree approach allocates several gigabytes and Manacher's runs in 100 MB. For most palindrome problems, Manacher's is the right primitive.

Q: Can it find all distinct palindromes?

Manacher's outputs palindromic radii at every position — by itself it does not enumerate distinct palindromes. To count or list distinct palindromic substrings you can pair Manacher's with a hash set, which is O(n²) total in the worst case (a string like 'aaaa…a' has n distinct palindromic substrings all of distinct lengths). For O(n) distinct-palindrome enumeration use the Eertree (palindromic tree) data structure, which builds a structured representation of all distinct palindromes. Manacher's is the right tool for radii; Eertree is the right tool for the substring set.

Q: Where is it applied (bioinformatics, NLP)?

DNA palindrome detection. A DNA palindrome (e.g., GAATTC, recognized by EcoRI restriction enzyme) reads the same on the complementary strand — so a sequence s contains a DNA palindrome where s[i..i + 2k] equals the reverse complement of itself. Reduce to standard palindrome by replacing each base with its complement, reversing, and applying Manacher's. RNA hairpin loop folding uses similar techniques. NLP: palindromic word detection in linguistic analysis. Competitive programming: longest palindromic substring, palindromic decomposition. Compression: identifying self-symmetric runs.

Q: Is the constant factor competitive with O(n²) for short strings?

For n < 64 or so the simple expand-around-center O(n²) approach often wins because of cache effects and lack of bookkeeping. Crossover happens around n = 100 to 200 on modern hardware. The per-iteration cost of Manacher's includes one division-by-2, two array reads, one min, and a comparison — about 4 to 6 ns per character. Naive expansion is 2 to 3 ns per character but has the n² scaling. For strings above a few thousand characters Manacher's is the only sane choice.

Compute palindromic radii at every position in linear time, not O(n²)

Manacher's algorithm finds all palindromic substrings of a string s in O(n) time. The naive approach (expand around each center) is O(n²). Manacher's trick: maintain a "rightmost reach" interval; for any new center inside that interval, mirror its palindromic radius from the symmetric position around the interval's center, then extend if possible. Discovered by Glenn Manacher in 1975 for online longest-palindrome detection. Used in linguistic analysis, DNA palindrome detection, and competitive programming.

TimeO(n)
NaiveO(n²)
OutputPalindromic radius at each center
TrickMirror within rightmost reach
InventedManacher 1975
Even palindromesInsert separators

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

Why Manacher matters

Interview classic. "Find the longest palindromic substring" appears in nearly every coding-interview list. The O(n²) center-expansion answer is acceptable; the O(n) Manacher answer is the polished follow-up that distinguishes a candidate. Big-tech interviewers routinely ask candidates to derive the linear-time bound.
Bioinformatics. Palindromic DNA sequences mark restriction-enzyme cut sites and are central to studying genome architecture. EcoRI cuts at GAATTC. Tools like EMBOSS palindrome and FAST-find scan multi-gigabase genomes for palindromes; Manacher's linear pass over a complement-reverse-overlay string finds all such sites in O(n).
RNA folding. RNA secondary structure prediction looks for hairpin loops where a stretch of the strand folds back on itself, which is structurally a complement-reversal palindrome. Manacher's algorithm gives a fast first-pass detection of candidate hairpin centers before more expensive thermodynamic models score them.
Compression heuristics. Some lossless compression schemes detect locally palindromic regions (e.g., text containing "abba" patterns) and apply specialized encodings. Manacher's cheap pass identifies the candidates.
NLP and linguistics. Linguistic palindromes (e.g., "racecar", "tattarrattat") and word-level palindromes in poetry analysis are detectable by Manacher's. Some morphological analyzers use palindrome detection to identify reduplicative morphology.
Competitive programming. Codeforces, AtCoder, and ICPC have hundreds of problems whose efficient solution begins with a Manacher pass. Longest palindromic substring, count of palindromic substrings, palindromic partitioning, and palindromic-period detection all reduce to Manacher.
Stepping stone to Eertree. The palindromic tree (Eertree) by Rubinchik and Shur builds a more powerful palindromic index in O(n). Many learners build intuition with Manacher first because the [C, R] window mirrors the construction step in Eertree.

Common misconceptions

It finds palindromes. It computes the palindromic radius at every center — an array P[1..n] where P[i] is the radius of the longest palindrome centered at i. Enumerating distinct palindromic substrings is a separate (potentially harder) problem.
Complex implementation. Implementations are about 20 lines of clean code: pre-process, two pointers C and R, a single for-loop with two cases. Many learners over-complicate it because the [C, R] invariant feels abstract; the canonical CP-Algorithms write-up fits on a single page.
The transformed string makes the algorithm slower. The transform doubles the input length, so wall-clock time roughly doubles versus a hypothetical odd-only Manacher. Still O(n). The simplification is worth the 2x because handling even palindromes inline doubles the code complexity.
Manacher works only for the longest palindrome. Manacher computes radii at every center, which trivially gives the longest. It also gives the count of palindromic substrings (sum of P[i] terms), the location of all palindromes of a given length, and so on.
Same as KMP. Different problem. KMP matches a pattern in a text. Manacher analyzes the internal structure of a single string. They share the "amortized window" proof technique but solve different tasks.
Slower than expand-around-center for short strings. True for n < ~100 due to constants. Above that, the n² scaling dominates and Manacher pulls clearly ahead.

The full algorithm

Pre-process s into t by inserting a separator (commonly '#') between every pair of characters and at the boundaries. Add sentinels '^' at the start and '$' at the end so character comparisons never need bounds checks. The transformed string has length 2n + 3.

Allocate radius array P of length 2n + 3. Initialize C = 0 and R = 0 (current center and right reach). For i from 1 to length(t) - 1: if i < R, set P[i] = min(R - i, P[2C - i]) — the mirror across center C. Otherwise P[i] = 0. Then attempt to extend: while t[i + P[i] + 1] equals t[i - P[i] - 1], increment P[i]. If i + P[i] > R, update C = i and R = i + P[i]. After the loop, the original-string palindromic radius at center i is P[i] / 2 (integer division). Longest palindrome is the original-string substring of that radius around the corresponding original index.

Worked example with proof of linear time

Let s = "abaab". Transformed t = "^#a#b#a#a#b#$" (length 13). Run through indices and you get P = [_, 0, 1, 0, 1, 0, 1, 0, 4, 0, 1, 0, 1, _]. The maximum P[i] = 4 at i = 8 — that is the center of the substring "baab" in the original string (radius 2). Total character comparisons are bounded by R's monotonic advance: R starts at 0 and ends at most at 2n + 3, and every "extending" comparison increments R by at least one. Inner-loop work over the whole pass is therefore at most 2n + 3 comparisons. Mirror copies are O(1) per iteration, total 2n + 3. Sum: O(n).

For DNA palindrome detection, define t' as the original sequence interleaved with its complement — A pairs with T, C pairs with G — and apply Manacher to detect positions where the sequence reads the same as its reverse complement. This identifies restriction-enzyme recognition sites in a single linear scan over even multi-gigabase genomes.

Frequently asked questions

Why does the mirror trick give O(n)?

Maintain center C and right reach R such that the palindrome centered at C extends out to R. For a new center i <= R, set its mirror j = 2C - i. The radius P[i] is at least min(R - i, P[j]) — because the palindrome at C is symmetric, so what we know on the left half tells us about the right half. Only when P[j] reaches the boundary do we need to extend with character comparisons, and each such comparison strictly advances R. R grows monotonically from 0 to at most n, so the total number of extension steps is O(n). The mirror copies are O(1) each, giving O(n) total.

Why insert separator characters?

To unify even and odd palindromes. Original 'abba' has an even palindrome with no center character. Transform to '^#a#b#b#a#$' (sentinels and separators). Now every palindrome — even or odd — has a single character center: 'abba' becomes the palindrome around center index 5 ('#'). After computing the radius array on the transformed string, the palindromic radius P[i] in the transformed coordinate divided by 2 gives the original-string radius, regardless of parity. Sentinels '^' and '$' (any character not in the alphabet) prevent boundary checks because they will never match any inner character.

How is Manacher's different from suffix-tree approaches?

Suffix trees can answer palindrome queries by combining the tree of s with the tree of reverse(s) and looking for common substrings — Gusfield gives an O(n) construction this way. Manacher's algorithm is direct: it never builds a tree, uses one pass and O(n) extra memory. Constants are an order of magnitude smaller; on a 100 MB input the suffix-tree approach allocates several gigabytes and Manacher's runs in 100 MB. For most palindrome problems, Manacher's is the right primitive.

Can it find all distinct palindromes?

Manacher's outputs palindromic radii at every position — by itself it does not enumerate distinct palindromes. To count or list distinct palindromic substrings you can pair Manacher's with a hash set, which is O(n²) total in the worst case (a string like 'aaaa…a' has n distinct palindromic substrings all of distinct lengths). For O(n) distinct-palindrome enumeration use the Eertree (palindromic tree) data structure, which builds a structured representation of all distinct palindromes. Manacher's is the right tool for radii; Eertree is the right tool for the substring set.

Where is it applied (bioinformatics, NLP)?

DNA palindrome detection. A DNA palindrome (e.g., GAATTC, recognized by EcoRI restriction enzyme) reads the same on the complementary strand — so a sequence s contains a DNA palindrome where s[i..i + 2k] equals the reverse complement of itself. Reduce to standard palindrome by replacing each base with its complement, reversing, and applying Manacher's. RNA hairpin loop folding uses similar techniques. NLP: palindromic word detection in linguistic analysis. Competitive programming: longest palindromic substring, palindromic decomposition. Compression: identifying self-symmetric runs.

Is the constant factor competitive with O(n²) for short strings?