← Graph View All Domains

A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Regular Expressions and Languages

College Depth 83 in the knowledge graph ☐ I know this ☆ Set as goal

199topics build on this

468prerequisites beneath it

See this on the map →

Nondeterministic Finite Automata→→Context-Free Grammars Post Correspondence Problem

Core Idea

Regular expressions are a compact algebraic notation for describing regular languages using concatenation, union, and Kleene star. Kleene's theorem establishes that the languages described by regular expressions are exactly those recognized by finite automata — every regex can be converted to an NFA, and every DFA can be converted to a regex. The pumping lemma for regular languages provides a tool for proving that certain languages (e.g., {aⁿ bⁿ}) are not regular, by showing that sufficiently long strings in the language must contain a pumpable substring.

How It's Best Learned

Practice converting between the three representations: regex to NFA (Thompson's construction), NFA to DFA (subset construction), and DFA to regex (state elimination). Then use the pumping lemma to prove specific languages are not regular — this sharpens understanding of what finite memory cannot achieve.

Common Misconceptions

The "regular expressions" in programming languages (Perl, Python, etc.) include backreferences and lookaheads that go beyond the formal definition and can match some non-regular languages.
The pumping lemma is a necessary condition for regularity, not sufficient — satisfying it does not prove a language is regular.

Explainer

You have studied nondeterministic finite automata (NFAs), so you know what a regular language is: a language accepted by some finite automaton, deterministic or nondeterministic. Regular expressions give you a completely different notation for exactly the same class of languages — an algebraic description rather than a machine description. Kleene's theorem, the central result here, says these two descriptions are interchangeable.

The syntax of regular expressions builds languages from three operations. Concatenation (AB) means "a string from A followed by a string from B." Union (A|B) means "a string from A or from B." Kleene star (A*) means "zero or more strings from A concatenated together." Starting from single-character base cases (and ∅ and ε), these three operations generate exactly the regular languages. So the expression `a(b|c)*` describes strings starting with 'a' followed by any sequence of 'b's and 'c's — a language you could also describe with a small NFA.

Kleene's theorem formalizes the equivalence. Every regular expression can be converted to an NFA (Thompson's construction builds the NFA inductively over the expression's structure, introducing ε-transitions to combine pieces), and every DFA can be converted back to a regular expression (state elimination removes states one by one, accumulating transitions into regular expression labels). This triangle of conversions — regex ↔ NFA ↔ DFA — means you can choose whichever representation is most convenient: regex for compact human-readable descriptions, NFA for theoretical reasoning about closure properties, DFA for efficient simulation.

The pumping lemma gives you a way to prove that certain languages are *not* regular. The key insight is that any finite automaton has a fixed number of states. If a string is long enough to exceed that count, the automaton must repeat a state while reading it — creating a "loop" in the computation. This loop can be pumped (iterated any number of times) while keeping the result in the same language, because the automaton follows the same path each time around the loop. If a language lacks this pumpable-substring property for sufficiently long strings, no finite automaton can recognize it. The classic example is {aⁿ bⁿ : n ≥ 0} — matching that n a's are followed by n b's requires memory that grows with n, which no finite automaton provides. The pumping lemma is a *necessary* condition for regularity, not sufficient: it proves languages are non-regular, but satisfying it doesn't make a language regular.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Dividing Integers → Unit Rates → Proportions → Percent Concept → Converting Between Fractions, Decimals, and Percents → Operations with Rational Numbers → Two-Step Equations → Solving Multi-Step Equations → Equations with Variables on Both Sides → Literal Equations → Slope-Intercept Form → Point-Slope Form → Writing Linear Equations → Parallel and Perpendicular Line Slopes → Graphing Linear Equations → Piecewise Functions → Step Functions → Composition of Functions → Inverse Functions → Radical Functions and Graphs → Rational Exponents → Exponential Functions and Graphs → Logarithms Introduction → Big-O Notation and Asymptotic Analysis → Breadth-First Search (BFS) → Shortest Paths in Unweighted Graphs → Dijkstra's Shortest Path Algorithm → Algorithm Analysis and Big-O Notation → Turing Machines → Deterministic Finite Automata → Nondeterministic Finite Automata → Regular Expressions and Languages

Longest path: 84 steps · 468 total prerequisite topics

Prerequisites (1)

Nondeterministic Finite Automatahard

Leads To (2)

Context-Free Grammarssoft Post Correspondence Problemsoft