Context-Free Grammars (CFGs)

College Depth 54 in the knowledge graph I know this Set as goal
Unlocks 323 downstream topics
CFG context-free grammars productions derivation

Core Idea

A context-free grammar (CFG) is a 4-tuple (V, Σ, R, S) of variables, terminals, production rules, and a start variable. Each rule replaces a single variable with any string of variables and terminals. The language of a CFG is the set of terminal strings derivable from the start variable. CFGs are strictly more powerful than regular expressions — they can describe languages like {aⁿbⁿ} that no DFA can recognize. CFGs are the foundation of programming language syntax, used in parser generators and compilers.

How It's Best Learned

Begin by writing grammars for arithmetic expressions (which naturally requires recursion) and palindromes. Trace leftmost and rightmost derivations step-by-step. Then attempt to characterize which languages require CFGs versus which are regular.

Common Misconceptions

Explainer

You have already seen that regular languages — described by regular expressions and recognized by finite automata — have a fundamental limitation: they cannot count. A DFA has no memory beyond its current state, which means it cannot verify that a string has exactly as many a's as b's. Context-free grammars (CFGs) overcome this by introducing recursive structure, and recursion is the key insight.

A CFG is a 4-tuple (V, Σ, R, S): a set of variables (non-terminals), a terminal alphabet, a set of production rules, and a start variable. Each rule has the form A → α, where A is a single variable and α is any string of variables and terminals. A derivation begins at S, repeatedly replaces a variable using some rule, and terminates when no variables remain. The language of the grammar is the set of all terminal strings reachable this way. Notice that derivations can be arbitrarily long — this is where the expressive power comes from.

The name "context-free" refers to the rule form: the left side is always a single variable, with no surrounding context constraining when the rule can fire. This is in contrast to context-sensitive grammars, where a rule A → B might only apply when A appears between specific symbols. Context-free rules are simple but powerful enough to capture nested structure, which is exactly what programming languages require — parentheses must balance, if-else blocks must be properly nested, function calls must close with matching arguments.

A crucial concept is ambiguity. A string w is ambiguous in grammar G if G admits two or more distinct parse trees for w. This is not just a theoretical curiosity: an ambiguous grammar for a programming language means a statement like 1 + 2 * 3 could be parsed as (1 + 2) * 3 or 1 + (2 * 3), giving different results. Grammar designers go to considerable effort to eliminate ambiguity, often by introducing operator precedence rules directly into the grammar structure.

CFGs sit in the middle of the Chomsky hierarchy: strictly more powerful than regular languages (they can express { aⁿbⁿ }), but strictly less powerful than context-sensitive and Turing-complete grammars (they cannot express { aⁿbⁿcⁿ }). The decision problem for CFLs — membership testing — is solvable in polynomial time (CYK algorithm runs in O(n³)), which makes CFGs practical for real compilers. Every time a compiler parses your source code, it is essentially running a CFL recognition algorithm on a carefully designed CFG.

Practice Questions 3 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsOperators and ExpressionsArithmetic Operators and Operator PrecedenceComparison Operators and Boolean TestsLogical Operators and Boolean AlgebraBoolean Algebra and Fundamental LawsCombinational Circuit DesignFlip-Flops and LatchesFinite State Machines (FSMs)Deterministic Finite Automata (DFA)Nondeterministic Finite Automata (NFA)Two-Way Finite AutomataNFA to DFA Conversion (Subset Construction)DFA Properties and Minimization AlgorithmsRegular Languages: Definition and CharacterizationContext-Free Grammars (CFGs)

Longest path: 55 steps · 246 total prerequisite topics

Prerequisites (4)

Leads To (9)