← Graph View All Domains

A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Boyer-Moore String Matching Algorithm

College Depth 77 in the knowledge graph ☐ I know this ☆ Set as goal

5topics build on this

327prerequisites beneath it

See this on the map →

String Matching: Naive and Optimized Approaches→→Maximum Flow: Network Flow Problems and Algorithms

Core Idea

Boyer-Moore matches from right to left and uses two heuristics: the bad-character rule (skip based on the mismatched character) and the good-suffix rule (skip based on the matched suffix). Preprocessing is O(m + σ) where σ is alphabet size; matching is O(n/m) best-case and O(n·m) worst-case.

How It's Best Learned

Trace Boyer-Moore by hand on a simple example, watching how right-to-left matching and the bad-character rule skip positions. Implement and compare performance to KMP on both best-case and worst-case inputs.

Common Misconceptions

Thinking Boyer-Moore is always faster than KMP; worst-case complexity can be poor on adversarial inputs.
Forgetting that the good-suffix rule requires additional preprocessing; often only the bad-character rule is implemented.
Not recognizing that Boyer-Moore's advantage grows with larger alphabets and longer patterns.

Explainer

From naive string matching, you know the brute-force approach: align the pattern at each position in the text, compare character by character from left to right, and shift by one position on failure. Boyer-Moore's first key innovation is reversing the comparison direction — it compares the pattern against the text from right to left. This seemingly small change unlocks a powerful ability: when a mismatch occurs, the algorithm can often skip ahead by much more than one position, because the mismatched character reveals information about multiple alignment positions at once.

The bad-character rule is the more intuitive of Boyer-Moore's two heuristics. When a mismatch occurs at some position and the text character is, say, 'X', the algorithm asks: does 'X' appear anywhere in the pattern? If not, the entire pattern can be shifted past that character — no alignment that includes 'X' can possibly match. If 'X' does appear in the pattern, the pattern is shifted to align its rightmost occurrence of 'X' with the text position. Consider searching for "EXAMPLE" in a long English text. If the rightmost character comparison reveals the text has a 'Z', and 'Z' does not appear in "EXAMPLE," the algorithm jumps forward by the full pattern length of 7 positions. This is why Boyer-Moore achieves sublinear best-case performance of O(n/m) — it can skip large portions of the text entirely, examining fewer characters than the text contains.

The good-suffix rule handles a subtler situation. When some characters at the right end of the pattern have already matched before a mismatch occurs, the matched portion (the "good suffix") constrains how far the pattern can safely shift. The algorithm looks for another occurrence of this suffix earlier in the pattern, or for a prefix of the pattern that matches a suffix of the good suffix, and shifts the minimum amount needed to align one of these with the text. This rule requires preprocessing the pattern into a table of shift values, which takes O(m) time. In practice, the algorithm takes the maximum of the shifts suggested by the bad-character rule and the good-suffix rule, ensuring the largest safe skip at each step.

Boyer-Moore excels in practice for natural-language text and DNA sequences — situations with reasonably large alphabets and patterns of moderate length. The larger the alphabet, the more likely a mismatched character does not appear in the pattern, leading to full-length skips. For a 26-character English alphabet and a 10-character pattern, most mismatches allow skipping 10 positions. However, on adversarial inputs with very small alphabets (like binary strings with repeated characters), the skip distances shrink and worst-case O(n·m) behavior can emerge. This is why the Galil variant and other refinements add worst-case linear guarantees. Understanding when Boyer-Moore shines and when it degrades helps you choose the right string matching algorithm for your specific domain.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Introduction to Exponents → Order of Operations → Integer Order of Operations → Variable Expressions → The Distributive Property → Variables and Expressions Review → Introduction to Polynomials → Adding and Subtracting Polynomials → Multiplying Polynomials → Factorial → Permutations → Combinations → Counting Principles: Addition and Multiplication Rules → Introduction to Graph Theory → Propositional Logic Foundations → Logical Equivalences → Boolean Algebra → Boolean Type and Truth Values → Comparison Operators and Boolean Tests → Logical Operators and Boolean Algebra → Conditional Statements → While Loops → For Loops → Arrays and Lists → String Matching: Naive and Optimized Approaches → Boyer-Moore String Matching Algorithm

Longest path: 78 steps · 327 total prerequisite topics

Prerequisites (1)

String Matching: Naive and Optimized Approacheshard

Leads To (1)

Maximum Flow: Network Flow Problems and Algorithmssoft