A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Branch Prediction and Speculative Execution

College Depth 104 in the knowledge graph ☐ I know this ☆ Set as goal

2topics build on this

370prerequisites beneath it

Hazards in Pipelined Processors Processor Status Flags and Condition Codes→→Out-of-Order Execution and Register Renaming Superscalar and VLIW Processors

Core Idea

Branch prediction guesses the outcome of conditional branches and speculatively fetches the predicted path, minimizing pipeline stalls from control hazards. Prediction tables track branch history; incorrect predictions require rollback and re-execution.

Explainer

From your study of control hazards, you know the core problem: when a pipelined processor encounters a conditional branch, it does not know whether to fetch the next sequential instruction or the branch target until the branch condition is evaluated, which happens several stages into the pipeline. Waiting for the result means stalling — inserting bubbles that waste cycles. In a 5-stage pipeline, this costs 1-2 cycles per branch. In a deep 15-stage pipeline, it could cost 10 or more. Since branches occur roughly every 5-7 instructions in typical code, the performance penalty of always stalling would be catastrophic. Branch prediction solves this by guessing the branch outcome and fetching instructions along the predicted path speculatively.

The simplest prediction strategy is static prediction: always predict that branches are not taken (continue to the next sequential instruction), or always predict backward branches as taken (since they are usually loop-back edges) and forward branches as not taken. This is cheap to implement and captures common loop behavior, achieving roughly 60-70% accuracy. Dynamic prediction does much better by learning from the branch's runtime history. A 1-bit predictor remembers whether the branch was taken last time and predicts it will do the same thing. This works well for branches that are consistently taken or not taken, but it mispredicts twice for every loop — once when entering (if the branch was not taken last time the loop ended) and once when exiting. A 2-bit saturating counter fixes this by requiring two consecutive mispredictions before flipping the prediction, achieving 85-90% accuracy on typical workloads.

Modern processors use two-level adaptive prediction, which tracks not just a single branch's history but the pattern of recent branch outcomes. A branch history register (BHR) records the last *n* outcomes (taken/not taken) as a bit string, and this pattern indexes into a pattern history table (PHT) of 2-bit counters. This allows the predictor to learn correlations — for example, that after the pattern taken-taken-not-taken, this branch is usually taken. Tournament predictors go further by maintaining multiple prediction mechanisms and a meta-predictor that selects whichever mechanism has been more accurate for each branch recently.

When a prediction turns out to be wrong, the processor must flush all speculatively executed instructions from the pipeline, discard any register or memory changes they made, and restart fetching from the correct path. This misprediction penalty equals the number of pipeline stages between fetch and branch resolution — wasted work that grows with pipeline depth. This is why prediction accuracy matters enormously: even going from 95% to 97% accuracy can yield measurable performance gains, because the remaining mispredictions each cost 10-20 cycles in a modern out-of-order processor. The branch predictor is, paradoxically, one of the most performance-critical components in a processor despite performing no actual computation.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Introduction to Exponents → Order of Operations → Integer Order of Operations → Variable Expressions → The Distributive Property → Variables and Expressions Review → Introduction to Polynomials → Adding and Subtracting Polynomials → Multiplying Polynomials → Factorial → Permutations → Combinations → Counting Principles: Addition and Multiplication Rules → Introduction to Graph Theory → Propositional Logic Foundations → Logical Equivalences → Boolean Algebra → Boolean Type and Truth Values → Comparison Operators and Boolean Tests → Logical Operators and Boolean Algebra → Boolean Algebra and Fundamental Laws → Logic Gates Fundamentals → Implementing Boolean Functions with Gates → Karnaugh Map Simplification → Combinational Circuit Design → Flip-Flops and Latches → Binary Counters: Design and Analysis → Binary Arithmetic → Fixed-Point Number Representation → Two's Complement Representation → Overflow and Underflow Detection → Binary Adders: Half-Adders and Full-Adders → Full Adder and Carry Propagation → Carry Lookahead Adder Design → Half Adder Circuit Design → Multiplication Circuit Design → Sequential Circuit Design → Registers and Register Files → Instruction Set Architecture (ISA) → Assembly Language Basics → CPU Datapath → Instruction Fetch-Decode-Execute Cycle → CPU Control Unit → Microinstruction Format and Control Signals → Hardwired vs. Microprogrammed Control → Processor Control Unit Design → Finite State Machines in Processor Control → Single-Cycle Processor Architecture → Multi-Cycle Processor Design and Execution States → CPU Pipelining → Pipeline Hazards → Hazards in Pipelined Processors → Branch Prediction and Speculative Execution

Longest path: 105 steps · 370 total prerequisite topics

Prerequisites (2)

Hazards in Pipelined Processorshard Processor Status Flags and Condition Codessoft

Leads To (2)

Out-of-Order Execution and Register Renamingsoft Superscalar and VLIW Processorssoft