Questions: String Matching: Naive and Optimized Approaches
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
Why does the naive string matching algorithm perform poorly when searching for 'AAAB' in 'AAAAAAAAB'?
AThe pattern is too long relative to the text for naive matching to handle
BNaive matching cannot handle repeated characters
CAt each mismatch, the algorithm resets to position 0 of the pattern and slides one character forward — discarding all information from the previous partial match
DThe naive algorithm has O(m log n) complexity, which is worse than O(nm) for this input
The naive algorithm's problem is memory loss: after matching 'AAA' and failing on 'B', it slides one position and re-compares those same three A's from scratch. For a text like 'AAAAAAAAB' and pattern 'AAAB', every starting position triggers 3–4 comparisons that mostly repeat work already done. KMP eliminates this by recording what was already matched in the failure function and resuming from there rather than restarting.
Question 2 Multiple Choice
KMP's failure function (prefix function) stores, for each position j in the pattern, the length of the longest proper prefix of pattern[0..j] that is also a suffix. How does this help during matching?
AIt tells the algorithm how many text characters to skip forward after a complete match
BIt predicts how many mismatches will occur for a given text
CAfter a mismatch at pattern position j, it tells the algorithm where to resume in the pattern — preserving already-matched characters without re-examining text characters
DIt precomputes the hash of each pattern prefix to enable O(1) comparison
When a mismatch occurs at position j, the failure function value failure[j-1] says: 'the first failure[j-1] characters of the pattern already match the text at this position.' The algorithm resumes from there instead of restarting at 0. This ensures the text pointer never moves backward — each text character is examined at most twice — giving O(n+m) total time. The failure function encodes the pattern's self-similarity to avoid redundant comparisons.
Question 3 True / False
In the worst case, KMP may still re-examine the same text character multiple times.
TTrue
FFalse
Answer: False
False — a key guarantee of KMP is that the text pointer never moves backward. Each text character is examined at most a constant number of times, giving O(n + m) total comparisons including the O(m) preprocessing step. The failure function ensures that after any mismatch, the pattern pointer shifts backward within the pattern (not the text), so previously examined text positions are never revisited. This is the fundamental improvement over naive matching, which can re-examine the same text characters many times.
Question 4 True / False
Boyer-Moore is generally faster than KMP in practice because it can skip large portions of the text without examining every character.
TTrue
FFalse
Answer: True
True — Boyer-Moore reads the pattern right-to-left and uses two heuristics (bad character rule and good suffix rule) to jump the pattern forward by potentially many positions at once. In the best case it achieves O(n/m) character comparisons — it can skip entire chunks of text. KMP guarantees O(n+m) in the worst case but examines every text character at least once. For real-world inputs like searching through long documents with moderate-length patterns, Boyer-Moore's average-case performance typically surpasses KMP, which is why it underlies tools like grep.
Question 5 Short Answer
Explain the key insight that makes KMP more efficient than naive string matching. What does it avoid doing that the naive algorithm does?
Think about your answer, then reveal below.
Model answer: KMP avoids re-examining text characters that were already part of a partial match. The naive algorithm, after failing at text position i + j (having matched j characters), slides one position and restarts from pattern position 0 — re-comparing characters it already knows match. KMP's failure function encodes the pattern's self-overlap: it tells the algorithm the farthest it can shift the pattern while still preserving whatever prefix already aligns with the text. The text pointer only ever moves forward, so each character is examined at most twice, giving O(n+m) time versus O(nm) worst case for the naive approach.
The insight is that a partial match is information, not wasted work. The failure function extracts the maximum reuse from that information. Boyer-Moore takes the complementary approach — working right-to-left and using mismatches to skip forward — which is often even faster in practice.