A control flow graph (CFG) represents a program's control structure as a directed graph where nodes are basic blocks (straight-line code with one entry/exit) and edges represent jumps. CFGs are the foundation for program analysis: dominance, loops, and dataflow properties are computed on CFGs. Building and analyzing CFGs is essential for optimization and verification.
When a compiler translates source code into intermediate representation (IR), it produces a flat list of three-address instructions. But a flat list hides a crucial dimension: not every instruction always executes. Branches, loops, and function returns mean that execution can take many paths through the code. A control flow graph makes this structure explicit by turning the flat instruction list into a directed graph that mirrors all the ways the program can actually run.
The first step in building a CFG is identifying basic blocks. A basic block is a maximal run of instructions with a single entry point (no jumps land in the middle) and a single exit point (only the last instruction may be a branch). Within a basic block, control flow is perfectly sequential: if the first instruction executes, all of them do. This is a powerful guarantee for optimization — you can propagate constants, eliminate dead code, and allocate registers within a block using only local information, without worrying about branching.
The second step is adding edges. After each basic block, execution either falls through to the next block, jumps unconditionally to some target, or branches conditionally to one of two targets. Each possibility becomes a directed edge in the CFG. A conditional if-else creates two outgoing edges from the block containing the branch: one to the "then" block and one to the "else" block. Loops create back edges — edges that point backward to an earlier block — which are the graph-theoretic signature of a loop. Finding all back edges lets the compiler identify natural loops and apply loop-specific optimizations like loop-invariant code motion.
With the CFG in hand, the compiler can compute global properties across all blocks. Dominator analysis asks: for each basic block B, which blocks must every execution path pass through before reaching B? The dominator tree organizes this information and enables structured optimizations like partial redundancy elimination. Liveness analysis uses the CFG edges in reverse to determine which variables are still needed at each program point, enabling efficient register allocation. All of these analyses — which you will study in depth as dataflow analysis — are defined as fixed-point computations on the CFG structure.
The CFG is not just an internal compiler data structure; it is also the foundation for static analysis tools, test coverage measurement (branch coverage counts CFG edges), and program verification. When a security scanner checks for use-after-free errors or null-pointer dereferences, it is walking paths through the program's CFG. Understanding the CFG therefore unlocks not just compiler optimizations but the broader field of program analysis.