Question 1

How does Boyer-Moore differ from KMP?

Accepted Answer

KMP scans the text strictly left to right and never re-examines a text character, guaranteeing O(n + m) in the worst case. Boyer-Moore scans the pattern right to left after aligning it to the text and, on a mismatch, slides the pattern forward by the larger of two heuristic shifts. The right-to-left scan is what enables skipping multiple text characters per mismatch — KMP's left-to-right scan can only ever advance by one position per character it reads. In practice on natural-language text with a 256-symbol alphabet, Boyer-Moore inspects roughly n/m characters, while KMP inspects all n. The trade is that Boyer-Moore's basic form is O(nm) worst-case until you bolt on Galil's rule.

Question 2

What is the bad-character rule?

Accepted Answer

When the right-to-left scan finds a mismatch at text character c, the pattern is shifted so that the rightmost occurrence of c in the pattern lines up with that text position. If c does not appear in the pattern at all, the entire pattern (m characters) is shifted past it. The table is precomputed in O(m + sigma) time where sigma is the alphabet size, indexed as bad[c] = m - 1 - last_occurrence(c) or m if c is absent. This single rule alone gives the Boyer-Moore-Horspool variant its sublinear behavior on most real text.

Question 3

What is the good-suffix rule?

Accepted Answer

When a suffix of the pattern of length k has matched and then a mismatch occurs, the good-suffix rule shifts the pattern so that another occurrence of that matched suffix (or its prefix that is also a pattern suffix) aligns with the text. Two cases are precomputed in O(m). Case 1: another copy of the matched suffix exists earlier in the pattern — shift so that copy aligns. Case 2: only a prefix of the suffix is itself a pattern prefix — shift so the prefix aligns. The good-suffix table mirrors the failure function in KMP and contributes the second of Boyer-Moore's two shift candidates; the algorithm always takes the maximum of bad-character and good-suffix shifts.

Question 4

When does worst-case O(nm) actually happen?

Accepted Answer

When the pattern is highly periodic and the text is constructed adversarially. Pattern aaab in text aaaa…aaaa: every alignment matches three characters (aaa) before mismatching at b, and the bad-character shift is only 1, so the work per alignment is m and there are n - m + 1 alignments — O(nm). Galil's modification fixes this by recording, after a partial match of length k, that the next k characters of text agree with the prefix of the pattern, so the next alignment can skip those comparisons. With Galil's rule Boyer-Moore is O(n) worst case while preserving the O(n/m) best case. GNU grep ships the Galil-augmented version.

Question 5

Why is Boyer-Moore-Horspool a simplification?

Accepted Answer

Horspool's 1980 simplification keeps only the bad-character heuristic, but uses the rightmost text character of the current alignment (not the mismatch character) to compute the shift. Implementation collapses to a single 256-entry table and a tight inner loop — about 20 lines in C. Empirically on English text Horspool runs within 5 to 15% of the speed of full Boyer-Moore for patterns up to 32 characters, while being substantially easier to write correctly. It is the version many textbooks teach and many production search libraries (e.g. Python's str.find for short needles in CPython) use as the inner kernel.

Question 6

Why is right-to-left scanning crucial?

Accepted Answer

Right-to-left lets the very first comparison disqualify a long stretch of text. If you align the pattern so its last character lands on text position i and the text has a character c there that does not occur anywhere in the pattern, you can move the entire pattern past i — m positions in a single step — without ever inspecting positions i - 1 through i - m + 1. Left-to-right scanning has no analogous shortcut, because the first mismatch happens at the leftmost character and gives no information about what's further right. The right-to-left choice is what turns string search from O(n) to O(n/m) on average.

Boyer-Moore String Search

Interactive visualization

Watch the 60-second explainer

Why Boyer-Moore matters

Common misconceptions

How the two heuristics combine

Practical implementation notes

Frequently asked questions