A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

The Parsing Problem

Graduate Depth 89 in the knowledge graph ☐ I know this ☆ Set as goal

9topics build on this

405prerequisites beneath it

Context-Free Grammars in Compiler Design Tokenization and Lexemes +1 more→→LL Parsing and Predictive Parsing LR Parsing Fundamentals +1 more

Core Idea

Syntax analysis (parsing) determines whether a token stream is valid according to a grammar and builds a parse tree or AST. The problem is: given a CFG and input tokens, construct a derivation tree. Not all grammars admit efficient parsing; ambiguous grammars have multiple derivations. Practical parsers require restrictive grammar classes (LL, LR) or disambiguating rules.

Explainer

The lexical analyzer you already built breaks source code into tokens — identifiers, keywords, operators, literals. But a flat list of tokens says nothing about structure. The expression `3 + 4 * 5` is five tokens, but its meaning depends entirely on how those tokens group: does the multiplication bind tighter than the addition? Parsing is the phase that recovers this hierarchical structure from the linear token stream, guided by the rules of a context-free grammar.

Recall that a context-free grammar defines a language through production rules: a nonterminal on the left can be replaced by a sequence of terminals and nonterminals on the right. Parsing is the inverse problem — given the terminals (tokens), find a sequence of production applications (a derivation) that produces them. The result is a parse tree (or its compressed form, an abstract syntax tree) that makes the grammatical structure explicit. For `3 + 4 * 5`, the parse tree shows multiplication nested deeper than addition, capturing the precedence rule encoded in the grammar.

The core difficulty is that not all grammars can be parsed efficiently. A grammar is ambiguous if some input has more than one valid parse tree — meaning the grammar assigns two different structures (and potentially two different meanings) to the same program. The classic example is the dangling-else problem: `if a then if b then s1 else s2` can associate the `else` with either `if`. Ambiguity must be resolved, either by rewriting the grammar or by adding disambiguating rules (such as "else binds to the nearest if").

Even unambiguous grammars may require exponential time to parse with a naive algorithm. Practical compilers restrict themselves to grammar subclasses that guarantee linear-time parsing. LL grammars are parsed top-down by reading input left-to-right and choosing productions by looking ahead a fixed number of tokens. LR grammars are parsed bottom-up by shifting tokens onto a stack and reducing them when a complete right-hand side is recognized. These two families cover nearly all programming language constructs, and the choice between them shapes the entire front end of a compiler. Understanding the parsing problem — what makes it hard, what makes it tractable — is the foundation for studying both approaches.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Introduction to Exponents → Order of Operations → Integer Order of Operations → Variable Expressions → The Distributive Property → Variables and Expressions Review → Introduction to Polynomials → Adding and Subtracting Polynomials → Multiplying Polynomials → Factorial → Permutations → Combinations → Counting Principles: Addition and Multiplication Rules → Introduction to Graph Theory → Propositional Logic Foundations → Logical Equivalences → Boolean Algebra → Boolean Type and Truth Values → Comparison Operators and Boolean Tests → Logical Operators and Boolean Algebra → Boolean Algebra and Fundamental Laws → Logic Gates Fundamentals → Implementing Boolean Functions with Gates → Karnaugh Map Simplification → Combinational Circuit Design → Flip-Flops and Latches → Finite State Machines (FSMs) → Deterministic Finite Automata (DFA) → Nondeterministic Finite Automata (NFA) → Two-Way Finite Automata → NFA to DFA Conversion (Subset Construction) → DFA Properties and Minimization Algorithms → Regular Languages: Definition and Characterization → Context-Free Grammars (CFGs) → Context-Free Grammar Properties and Ambiguity → Parse Trees, Derivations, and Ambiguity in CFGs → Context-Free Grammars in Compiler Design → The Parsing Problem

Longest path: 90 steps · 405 total prerequisite topics

Prerequisites (3)

Context-Free Grammars in Compiler Designhard Tokenization and Lexemeshard Formal Languages and Stringssoft

Leads To (3)

LL Parsing and Predictive Parsinghard LR Parsing Fundamentalshard Operator Precedence Parsinghard