Context-free grammars formally describe the syntax of programming languages. Each grammar rule specifies how nonterminals can be rewritten into terminals and nonterminals. A parse tree derives a sentence by applying rules recursively; the tree structure encodes the program's grammatical composition. CFGs are expressive enough for most language constructs but leave semantics to later compilation phases.
You have already studied context-free grammars as a formal language concept and know how parse trees represent derivations. In compiler design, CFGs take on a very specific practical role: they are the specification language for programming language syntax. When a language designer writes that an if-statement looks like `if (expr) stmt else stmt`, they are writing a production rule of a context-free grammar. The entire syntactic structure of a programming language — expressions, statements, declarations, programs — is defined by a collection of such rules.
A typical compiler grammar might include rules like: `Expr → Expr + Term | Term`, `Term → Term * Factor | Factor`, `Factor → ( Expr ) | id | num`. These rules do two things simultaneously. First, they define which strings of tokens are syntactically valid programs — any token sequence that can be derived from the start symbol is a legal program. Second, and more importantly for compilation, the structure of the derivation encodes how the program should be understood. The rule `Expr → Expr + Term` implicitly says that addition is left-associative, because the recursive `Expr` appears on the left. The fact that `Term` handles multiplication while `Expr` handles addition encodes that multiplication binds more tightly — operator precedence falls out naturally from the grammar's structure.
This is why CFGs are preferred over simpler formalisms like regular expressions for syntax specification. Regular expressions can describe token structure (what an identifier or number looks like), but they cannot express recursive nesting — matching parentheses, nested if-else blocks, arbitrarily deep expression trees. The recursive nature of CFG productions maps directly onto the recursive structure of programs. A function body contains statements, which contain expressions, which may contain function calls, which contain argument expressions, nesting arbitrarily deep. Only a context-free grammar can capture this.
The grammar serves as the blueprint for the parser, the compiler phase that takes a flat sequence of tokens from the lexer and produces a parse tree (or more commonly, an abstract syntax tree). Every parsing algorithm — recursive descent, LL, LR, LALR — is a strategy for efficiently finding the derivation that a CFG assigns to a token sequence. The grammar must often be rewritten to suit the parser: eliminating left recursion for top-down parsers, factoring common prefixes to avoid ambiguity. But the grammar remains the authoritative definition of what is syntactically legal. Semantic analysis — type checking, scope resolution, meaning — comes later, operating on the tree structure that the grammar defined.