A context-free grammar (CFG) generates strings through production rules that replace a single nonterminal with a string of terminals and nonterminals, regardless of surrounding context. The language of all derivable strings is a context-free language (CFL), and the class of CFLs is exactly the class recognized by nondeterministic pushdown automata. Every CFG can be converted to Chomsky normal form, where each production is either A -> BC or A -> a, enabling the CYK parsing algorithm. The pumping lemma for context-free languages proves that certain languages (e.g., {a^n b^n c^n}) are not context-free.
Write grammars for familiar languages — arithmetic expressions, matched parentheses, palindromes — and draw derivation (parse) trees. Then convert a grammar to Chomsky normal form step by step, and apply the CFL pumping lemma to prove {a^n b^n c^n} is not context-free. This builds intuition for where the stack-based model breaks down.
You already understand pushdown automata — machines that recognize strings by reading input and consulting a stack. A context-free grammar (CFG) is the generative counterpart: instead of recognizing, it produces. A grammar has a finite set of nonterminals (placeholder symbols, written in uppercase), a set of terminals (the actual alphabet of the language), a set of production rules of the form A → α, and a designated start symbol. A derivation begins with the start symbol and repeatedly replaces one nonterminal with the right-hand side of some production rule, until only terminals remain. The "context-free" in the name means the replacement is unconditional — you may replace A regardless of what surrounds it.
The connection to your prerequisite is precise: the class of languages generated by CFGs is exactly the class recognized by nondeterministic pushdown automata. This equivalence is proved by construction in both directions — you can convert any CFG into an equivalent PDA, and any PDA into an equivalent CFG. As a concrete example, consider the language of balanced parentheses. The grammar S → SS | (S) | ε generates it cleanly, and the corresponding PDA simply pushes on '(' and pops on ')'. For arithmetic expressions, a grammar can encode operator precedence and associativity through nesting of nonterminals, which is exactly how real parsers work.
Chomsky Normal Form (CNF) is a restricted grammar form where every production is either A → BC (two nonterminals) or A → a (one terminal). Every CFG can be converted to CNF through a sequence of transformations: eliminate ε-productions, eliminate unit productions (A → B), and break long right-hand sides into binary ones by introducing new nonterminals. CNF is not a restriction on the languages — it is purely a structural normalization that enables the CYK algorithm, a dynamic programming parser that decides in O(n³) time whether a string of length n is in the language. You build a table over all substrings and fill it bottom-up using the binary structure of CNF.
The pumping lemma for context-free languages is the tool for proving a language is not context-free. It states that for any CFL, long enough strings can be written as uvwxy such that pumping v and x together (uv^i wx^i y) stays in the language. The proof uses the fact that long derivations must repeat a nonterminal (a pigeonhole argument on parse trees), and pumping corresponds to repeating a subtree. The classic application: {a^n b^n c^n} cannot be pumped because pumping any two substrings cannot simultaneously fix all three counts. This language requires a machine that can count three things independently — something a single stack cannot do. This is where the CFL class ends and the context-sensitive languages begin.