Pumping Lemma for Context-Free Languages

Graduate Depth 59 in the knowledge graph I know this Set as goal
Unlocks 216 downstream topics
pumping-lemma context-free non-CFL proof

Core Idea

The CFL pumping lemma states that for every CFL L there is a pumping length p such that any string s ∈ L with |s| ≥ p can be split into s = uvxyz where |vy| ≥ 1, |vxy| ≤ p, and for all i ≥ 0 the string uvⁱxyⁱz ∈ L. The proof uses the fact that in a CNF parse tree for a long string, some variable must repeat on a root-to-leaf path, giving two pumpable substrings. It is used to show languages like {aⁿbⁿcⁿ} and {aⁿ² } are not context-free.

How It's Best Learned

Use the same adversarial game structure as the regular pumping lemma but now the adversary splits into 5 parts. For {aⁿbⁿcⁿ}, note that v and y together cannot cover all three symbol types, so pumping either inflates one or two but not all three counts equally.

Common Misconceptions

Explainer

You already know the pumping lemma for regular languages: if a language is regular, then sufficiently long strings can be split into three parts and the middle part "pumped" any number of times while staying in the language. The pumping lemma for context-free languages extends this idea, but now the structure mirrors the richer generative power of context-free grammars. Instead of splitting a string into three parts (xyz), you split it into five parts (uvxyz), and instead of pumping one substring, you pump two substrings — v and y — simultaneously. The formal statement says: for any CFL L, there exists a pumping length p such that every string s in L with |s| ≥ p can be written as s = uvxyz where |vy| ≥ 1, |vxy| ≤ p, and for all i ≥ 0, the string uvⁱxyⁱz is also in L.

The intuition comes directly from parse trees in Chomsky Normal Form. If you have a CNF grammar with k variables and a string long enough that its parse tree is tall enough, then by the pigeonhole principle some variable R must appear at least twice on a root-to-leaf path. The higher occurrence of R generates a subtree that contains the lower occurrence, which in turn generates some substring x. The portion of the string generated by the higher R but outside the lower R's subtree gives you v on the left and y on the right. Because R derives a string containing R again, you can repeat this derivation any number of times — pumping v and y in lockstep — or skip it entirely (i = 0), and the result is still derivable from the grammar.

The lemma is used exactly like its regular counterpart: as a tool for proving languages are not context-free, via proof by contradiction. You assume L is a CFL, invoke the lemma, and then find a specific string and show that no matter how the adversary splits it into uvxyz (respecting the length constraints), pumping produces a string outside L. The classic example is {aⁿbⁿcⁿ | n ≥ 0}. Choose s = aᵖbᵖcᵖ. The constraint |vxy| ≤ p means v and y together can span at most two of the three symbol blocks. So pumping increases the count of at most two symbol types while leaving the third unchanged, breaking the equal-count requirement.

Two details trip up many students. First, remember that v and y are pumped together — you always produce uvⁱxyⁱz, not uvⁱxy^jz with independent exponents. This simultaneous pumping reflects the tree structure: both pieces come from the same repeated variable. Second, the adversary controls the split, not you. Your job is to choose a clever string and then show that every possible compliant split fails. This means you often need a case analysis: "if v and y are both in the a-block, pumping adds more a's but not b's or c's; if v spans a's and y spans b's, pumping adds a's and b's but not c's" — and so on for every case. If every case leads to a string outside L, the proof is complete.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsOperators and ExpressionsArithmetic Operators and Operator PrecedenceComparison Operators and Boolean TestsLogical Operators and Boolean AlgebraBoolean Algebra and Fundamental LawsCombinational Circuit DesignFlip-Flops and LatchesFinite State Machines (FSMs)Deterministic Finite Automata (DFA)Nondeterministic Finite Automata (NFA)Two-Way Finite AutomataNFA to DFA Conversion (Subset Construction)DFA Properties and Minimization AlgorithmsRegular Languages: Definition and CharacterizationContext-Free Grammars (CFGs)Pushdown Automata (PDA)Equivalence of CFGs and Pushdown AutomataClosure Properties of Context-Free LanguagesLimitations of Context-Free LanguagesPumping Lemma for Context-Free Languages

Longest path: 60 steps · 267 total prerequisite topics

Prerequisites (7)

Leads To (1)