Dataflow analysis computes information about how data flows through a program. It solves systems of constraints on basic blocks, iterating until a fixpoint is reached. Forward analyses (reaching definitions) track properties forward through the CFG; backward analyses (live variables) track them backward. Dataflow results enable optimizations like constant propagation and dead-code elimination.
Dataflow analysis is a family of techniques for computing facts about a program's runtime behavior using only the static structure of its code. Instead of executing the program, you reason about what *could* happen on any possible execution path — and you do this by working with the control-flow graph (CFG) you already know, propagating information from block to block until the solution stabilizes.
The framework works as follows. Each basic block has a transfer function that describes how that block transforms the dataflow information. For reaching definitions, for example, a block that assigns `x = 3` *generates* that definition and *kills* any earlier definition of `x`. The global solution must satisfy the dataflow equations: the information at the entry of each block equals the meet (union or intersection, depending on the analysis) of the information at the exit of all its predecessors. You initialize all blocks conservatively, then iterate — recomputing each block's entry and exit sets using the current values of its neighbors — until nothing changes. That stable state is the fixpoint.
The direction of propagation divides analyses into two classes. Forward analyses flow information in the same direction as execution: the facts at block B's entry depend on B's predecessors. Reaching definitions is the canonical example — you ask which assignments made earlier might still be "live" as you enter B. Backward analyses flow in reverse: the facts at B's entry depend on what happens in B's successors. Live variable analysis is the canonical example — a variable is live entering B if it might be used before being overwritten on some path *continuing from* B. Recognizing which direction an analysis flows is the key to setting up the equations correctly.
Termination is guaranteed because dataflow values inhabit a finite lattice, and the transfer functions are monotone — each iteration can only add information (for union-based analyses) or remove it (for intersection-based analyses), never reverse a prior change. Since the sets of definitions or variables are finite, this monotone sequence must eventually plateau. In practice, convergence is fast — often in just a few passes, with loops requiring at most as many iterations as the nesting depth.
Dataflow results directly power compiler optimizations. Reaching definitions enable constant propagation: if only one definition of `x` reaches a use and that definition assigns a constant, the use can be replaced with the constant. Live variable analysis enables dead-code elimination: if a variable is assigned but not live afterward (never used before being overwritten), the assignment can be removed. These are among the most impactful optimizations in production compilers, and both rest on the same algorithmic foundation of iterative fixpoint computation over the CFG.