You are searching for an 8-character pattern in a text consisting entirely of the characters 'A' and 'B'. Which of the following best explains why Boyer-Moore performs poorly on this input?
ARight-to-left comparison is less efficient than left-to-right for binary alphabets
BThe bad-character rule rarely allows large skips because the mismatched character almost always appears in the pattern, shrinking skip distances toward 1
CThe good-suffix rule does not apply when the alphabet has only two symbols
DBoyer-Moore requires O(m²) preprocessing time for small alphabets
Boyer-Moore's advantage comes from the bad-character rule: when a mismatch occurs, if the mismatched text character does not appear in the pattern, the algorithm can skip the full pattern length. With a binary alphabet, almost every character in the text appears somewhere in the pattern, so skips shrink to 1 or 2 positions and worst-case O(n·m) behavior emerges. Large alphabets are what make Boyer-Moore shine — in natural language or DNA with 26 or 4 symbols, mismatches often allow full-length skips.
Question 2 Multiple Choice
Which of Boyer-Moore's two heuristics requires preprocessing the pattern into a table of shift values based on what was already matched before the mismatch?
AThe bad-character rule, because it indexes every character in the pattern
BThe good-suffix rule, because it uses the already-matched suffix to determine the safe shift
DNeither rule requires preprocessing — shifts are computed at match time
The good-suffix rule handles the situation where some characters at the right end of the pattern already matched before a mismatch. It looks for another occurrence of that matched suffix earlier in the pattern, or a prefix of the pattern that matches a suffix of the good suffix, and shifts to align it. This requires O(m) preprocessing of the pattern into a shift table. The bad-character rule, in contrast, preprocesses a table of the rightmost position of each character in the pattern — a simpler lookup. In practice, the algorithm takes the maximum shift from both rules at each step.
Question 3 True / False
Boyer-Moore compares the pattern against the text from left to right, just like the naive algorithm.
TTrue
FFalse
Answer: False
This is exactly backwards. Boyer-Moore's key innovation is comparing from RIGHT to LEFT within the pattern at each alignment position. This reversal is what enables the bad-character rule to produce large skips: when a mismatch occurs at the rightmost comparison, the mismatched text character immediately tells you how far ahead you can shift the entire pattern. Left-to-right comparison would only reveal mismatch information about the leftmost character, which is less useful for skipping.
Question 4 True / False
Boyer-Moore generally achieves better best-case performance on inputs with larger alphabets than on inputs with smaller alphabets.
TTrue
FFalse
Answer: True
This is true and is the central reason Boyer-Moore excels in practice for natural-language text. With a large alphabet (e.g., 26 English letters or 128 ASCII characters), mismatched text characters are unlikely to appear in the pattern, so the bad-character rule frequently allows skipping the full pattern length. For a 10-character English-language pattern, most mismatches allow a 10-position skip. With a binary alphabet, the skips collapse. This is why Boyer-Moore is excellent for DNA pattern matching (4-character alphabet of 'ACGT') only with longer patterns — the longer the pattern, the more characters become rare in it even with a small alphabet.
Question 5 Short Answer
Why does Boyer-Moore's right-to-left comparison enable larger skips than left-to-right comparison, giving it sublinear best-case performance?
Think about your answer, then reveal below.
Model answer: When you compare right-to-left and a mismatch occurs at the rightmost position, the mismatched text character has not yet been 'used' by any successful comparison. This tells you that no alignment that places the pattern over that text character (other than the alignments where it matches a pattern position containing that character) can succeed. If the mismatched character doesn't appear in the pattern at all, the entire pattern can be shifted past it — a skip equal to the full pattern length. The first comparison that fails can already rule out many candidate alignments at once. Left-to-right comparison would fail at the first character and shift by one, gaining nothing from the mismatch beyond the single position.
The insight is that information from a failure can be used to skip more than one position. In the naive algorithm, a failure shifts by exactly 1. In Boyer-Moore, the bad-character rule's skip is determined by how far ahead the mismatched character next appears in the pattern (or the full pattern length if it doesn't appear). Because the rightmost comparison is the one that mismatches, the algorithm can often jump forward by m positions — examining fewer characters than the text contains, hence O(n/m) best-case time.