Cache Coherence Protocols and Memory Consistency

Graduate Depth 68 in the knowledge graph I know this Set as goal
Unlocks 1 downstream topic
caching consistency coherence

Core Idea

Cache coherence protocols maintain consistency between multiple caches in a system. MESI (Modified, Exclusive, Shared, Invalid) is a common protocol that tracks cache line states and coordinates through snooping or directory-based schemes. Correct coherence is essential to prevent processes from seeing inconsistent data when multiple CPUs or nodes have copies of the same memory location.

Explainer

From your work with consistency models and the synchronization problem, you know that when multiple processors or nodes share data, concurrent access without coordination leads to inconsistent views. Cache coherence is the specific instance of this problem that arises when multiple processors each maintain their own local cache of shared memory. If processor A writes a new value to address X in its cache, processor B's cache still holds the stale old value — and without a coherence protocol, B has no way of knowing its copy is outdated.

The MESI protocol solves this by assigning each cache line one of four states. Modified means this cache holds the only valid copy and it has been changed — main memory is stale. Exclusive means this cache holds the only copy and it matches main memory — no other cache has it. Shared means multiple caches hold this line and all copies match main memory. Invalid means this cache line is not usable — it has been invalidated because another processor modified the data. Every read and write triggers state transitions: when processor A writes to a Shared line, the protocol sends an invalidation message to all other caches holding that line, transitioning their copies to Invalid and A's copy to Modified.

There are two main coordination mechanisms. In snooping protocols, every cache watches (snoops on) a shared bus and reacts when it sees transactions involving addresses it holds. This works well for small numbers of processors sharing a bus, but does not scale — every cache must see every transaction. In directory-based protocols, a central directory tracks which caches hold copies of each memory block. When a write occurs, the directory sends targeted invalidation messages only to caches that actually hold the line, avoiding broadcast overhead. This scales to larger systems but adds latency for the directory lookup.

Understanding cache coherence bridges the gap between the abstract consistency models you have studied and the physical reality of how hardware enforces them. The consistency model tells you what ordering guarantees the system provides to programmers; the coherence protocol is the mechanism that delivers those guarantees at the hardware level. When coherence works correctly, programmers can reason about shared memory without thinking about caches at all. When it breaks down — or when the performance cost of maintaining coherence becomes the bottleneck — it explains phenomena like false sharing (two unrelated variables on the same cache line causing constant invalidations) and motivates the design of systems that minimize shared mutable state entirely.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsOperators and ExpressionsArithmetic Operators and Operator PrecedenceComparison Operators and Boolean TestsLogical Operators and Boolean AlgebraBoolean Algebra and Fundamental LawsCombinational Circuit DesignFlip-Flops and LatchesBinary Counters: Design and AnalysisBinary ArithmeticFixed-Point Number RepresentationTwo's Complement RepresentationOverflow and Underflow DetectionBinary Adders: Half-Adders and Full-AddersFull Adder and Carry PropagationCarry Lookahead Adder DesignHalf Adder Circuit DesignMultiplication Circuit DesignSequential Circuit DesignRegisters and Register FilesInstruction Set Architecture (ISA)Kernel Architecture and OS StructureSystem Calls and User/Kernel ModeProcesses and the Process Control BlockProcess Creation: fork() and exec()Process Termination and Resource CleanupProcess States and State TransitionsThreads and ConcurrencyThe Critical Section Problem and Race ConditionsCache Coherence Protocols and Memory Consistency

Longest path: 69 steps · 250 total prerequisite topics

Prerequisites (2)

Leads To (1)