A pushdown automaton (PDA) extends an NFA with an unbounded stack. At each step, a PDA reads an input symbol (or ε), pops a symbol from the stack, transitions to a new state, and pushes a (possibly different) string onto the stack. PDAs recognize exactly the context-free languages. Nondeterministic PDAs are strictly more powerful than deterministic PDAs — unlike the DFA/NFA equivalence, adding nondeterminism gives PDAs additional expressive power. The stack is what allows PDAs to track nested structure that finite automata cannot.
Design a PDA for {aⁿbⁿ : n ≥ 0} by hand: push an 'a' for each a read, then pop for each b. Verify acceptance by both empty stack and final state. Then try {wwᴿ : w ∈ {a,b}*} to see why nondeterminism is needed.
You already know that a finite automaton has a fixed number of states and no memory beyond which state it currently occupies. This means a DFA cannot count — it cannot verify that an input has equal numbers of a's and b's, for instance, because tracking a count requires unbounded memory. A pushdown automaton solves this by adding exactly one piece of auxiliary storage: a stack. The stack is last-in-first-out and unbounded in size, giving the machine a simple but powerful form of memory.
Consider the classic language {aⁿbⁿ : n ≥ 0} — strings of n a's followed by n b's. No finite automaton can recognize this because it would need to "remember" how many a's it saw. A PDA handles it naturally: as it reads each 'a', it pushes a marker onto the stack. When it starts reading b's, it pops one marker per 'b'. If the stack is exactly empty when the input ends, the counts matched and the string is accepted. The stack acts as a counter, and the LIFO discipline ensures the matching is done in the right order — the most recently pushed item is the first one checked.
The formal definition of a PDA transition is richer than a DFA's. Each step depends on three things: the current state, the current input symbol (or ε, meaning no input is consumed), and the symbol on top of the stack. The machine then moves to a new state, pops the top stack symbol, and pushes a string of zero or more symbols. This push-and-pop mechanism lets the PDA track nested structure — matching parentheses, balanced tags, recursive grammar rules — which is exactly what context-free languages require.
A critical distinction from finite automata is that nondeterministic PDAs are strictly more powerful than deterministic ones. With DFAs and NFAs, nondeterminism is a convenience — every NFA has an equivalent DFA. Not so with PDAs. The language of even-length palindromes {wwᴿ : w ∈ {a,b}*} requires the machine to "guess" where the middle of the string is, because there is no marker separating the first half from the reversed second half. A nondeterministic PDA can branch at every position, trying "the middle is here," and accept if any branch succeeds. A deterministic PDA, locked into a single computation path, cannot solve this problem. This gap means the deterministic context-free languages (DCFL) form a strict subset of the context-free languages, a fact with direct consequences for parser design — deterministic PDAs underlie efficient LR and LL parsers, while the full power of nondeterministic PDAs corresponds to the broader class of grammars that may require backtracking or more expensive parsing algorithms.