Branch Prediction and Speculative Execution

College Depth 66 in the knowledge graph I know this Set as goal
Unlocks 2 downstream topics
branch prediction speculation performance

Core Idea

Branch prediction guesses the outcome of conditional branches and speculatively fetches the predicted path, minimizing pipeline stalls from control hazards. Prediction tables track branch history; incorrect predictions require rollback and re-execution.

Explainer

From your study of control hazards, you know the core problem: when a pipelined processor encounters a conditional branch, it does not know whether to fetch the next sequential instruction or the branch target until the branch condition is evaluated, which happens several stages into the pipeline. Waiting for the result means stalling — inserting bubbles that waste cycles. In a 5-stage pipeline, this costs 1-2 cycles per branch. In a deep 15-stage pipeline, it could cost 10 or more. Since branches occur roughly every 5-7 instructions in typical code, the performance penalty of always stalling would be catastrophic. Branch prediction solves this by guessing the branch outcome and fetching instructions along the predicted path speculatively.

The simplest prediction strategy is static prediction: always predict that branches are not taken (continue to the next sequential instruction), or always predict backward branches as taken (since they are usually loop-back edges) and forward branches as not taken. This is cheap to implement and captures common loop behavior, achieving roughly 60-70% accuracy. Dynamic prediction does much better by learning from the branch's runtime history. A 1-bit predictor remembers whether the branch was taken last time and predicts it will do the same thing. This works well for branches that are consistently taken or not taken, but it mispredicts twice for every loop — once when entering (if the branch was not taken last time the loop ended) and once when exiting. A 2-bit saturating counter fixes this by requiring two consecutive mispredictions before flipping the prediction, achieving 85-90% accuracy on typical workloads.

Modern processors use two-level adaptive prediction, which tracks not just a single branch's history but the pattern of recent branch outcomes. A branch history register (BHR) records the last *n* outcomes (taken/not taken) as a bit string, and this pattern indexes into a pattern history table (PHT) of 2-bit counters. This allows the predictor to learn correlations — for example, that after the pattern taken-taken-not-taken, this branch is usually taken. Tournament predictors go further by maintaining multiple prediction mechanisms and a meta-predictor that selects whichever mechanism has been more accurate for each branch recently.

When a prediction turns out to be wrong, the processor must flush all speculatively executed instructions from the pipeline, discard any register or memory changes they made, and restart fetching from the correct path. This misprediction penalty equals the number of pipeline stages between fetch and branch resolution — wasted work that grows with pipeline depth. This is why prediction accuracy matters enormously: even going from 95% to 97% accuracy can yield measurable performance gains, because the remaining mispredictions each cost 10-20 cycles in a modern out-of-order processor. The branch predictor is, paradoxically, one of the most performance-critical components in a processor despite performing no actual computation.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsOperators and ExpressionsArithmetic Operators and Operator PrecedenceComparison Operators and Boolean TestsLogical Operators and Boolean AlgebraBoolean Algebra and Fundamental LawsCombinational Circuit DesignFlip-Flops and LatchesBinary Counters: Design and AnalysisBinary ArithmeticFixed-Point Number RepresentationTwo's Complement RepresentationOverflow and Underflow DetectionBinary Adders: Half-Adders and Full-AddersFull Adder and Carry PropagationCarry Lookahead Adder DesignHalf Adder Circuit DesignMultiplication Circuit DesignSequential Circuit DesignRegisters and Register FilesInstruction Set Architecture (ISA)Assembly Language BasicsCPU DatapathCPU Control UnitCPU PipeliningPipeline HazardsHazards in Pipelined ProcessorsBranch Prediction and Speculative Execution

Longest path: 67 steps · 247 total prerequisite topics

Prerequisites (1)

Leads To (2)