Recursive descent parsing converts grammar rules directly into mutually-recursive functions. This approach is easy to implement and debug, though it works best with left-factored grammars. Understanding RDP reveals the deep connection between grammars and code.
Write a recursive descent parser by hand for a small language. Implement error recovery and careful lookahead handling.
LL(1) is the only restriction for RDP (you can use limited lookahead or backtracking). RDP is not used in real compilers (many modern compilers use hand-written RDP).
You already understand how grammars define the structure of a language and how recursion lets a function call itself to handle nested structures. Recursive descent parsing connects these two ideas directly: each grammar rule becomes a function, and the recursive structure of the grammar becomes the recursive call structure of the parser.
Consider a simple expression grammar: an expression is a term, optionally followed by `+` or `-` and another term; a term is a factor, optionally followed by `*` or `/` and another factor; a factor is a number or a parenthesized expression. In a recursive descent parser, you write three functions — `parseExpression()`, `parseTerm()`, and `parseFactor()`. Each function looks at the current token (the lookahead), decides which production to apply, consumes the tokens that match, and calls other parsing functions for non-terminals in the production. When `parseFactor()` sees an open parenthesis, it calls `parseExpression()` recursively — this is where the "recursive descent" name comes from. The parser literally descends through the grammar's hierarchy via recursive calls.
The elegance of this approach is that the parser's control flow mirrors the grammar's structure. Debugging is natural: if parsing fails inside `parseTerm()`, you know the error is in a term. Adding a new language construct means adding a new function and updating the relevant caller. This directness is why major production compilers — GCC (for C++), Clang, the Go compiler, and the Rust compiler — all use hand-written recursive descent parsers rather than generated ones.
The main constraint is left recursion. A grammar rule like `E → E + T` would cause `parseExpression()` to call itself immediately without consuming any input, creating infinite recursion. You must left-factor the grammar, rewriting left-recursive rules into right-recursive or iterative form. The rule becomes `E → T (('+' | '-') T)*`, which translates naturally into a while-loop inside `parseExpression()`: parse one term, then loop while the next token is `+` or `-`, consuming the operator and parsing another term. This transformation is mechanical but essential — it is the price of the recursive descent approach.
Handling lookahead correctly is the other key skill. In a strict LL(1) parser, you examine exactly one token to decide which production to apply. But real languages sometimes require more context. When two alternatives start with the same token, you can left-factor the grammar to postpone the decision, use limited lookahead (peek at two or three tokens), or even allow backtracking (try one alternative, and if it fails, reset and try another). Production-quality recursive descent parsers freely mix these techniques, trading strict LL(1) purity for practical expressiveness. The result is a parser that is easy to write, easy to maintain, and produces excellent error messages — because at every point, the code knows exactly what it was trying to parse and can report precisely what went wrong.