A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

The Consensus Problem

Graduate Depth 63 in the knowledge graph ☐ I know this ☆ Set as goal

22topics build on this

290prerequisites beneath it

Failure Models in Distributed Systems Synchronous vs. Asynchronous Distributed Systems +1 more→→Byzantine Fault Tolerance and Practical BFT Distributed Lock Management +8 more

Core Idea

Consensus requires all non-faulty processes to decide on a single value, even when some processes fail or propose conflicting values. Consensus must satisfy: agreement (all non-faulty processes decide identically), validity (a decided value was proposed), and termination (all non-faulty processes eventually decide). This foundational problem subsumes many practical coordination challenges.

Explainer

Imagine a group of generals who must unanimously agree on whether to attack or retreat, communicating only by messenger — and some generals may be traitors sending false orders. This is the essence of the consensus problem in distributed systems. The challenge is not just agreement but guaranteed agreement in the presence of failures, and doing so without any central coordinator.

The three properties of consensus — agreement, validity, and termination — each rules out a different trivial cheat. Without agreement, processes could decide different values. Without validity, an algorithm could satisfy agreement by having everyone decide some hardcoded constant regardless of input. Without termination, an algorithm could satisfy the other two by simply never deciding. All three are required simultaneously, and that turns out to be surprisingly hard.

The landmark FLP impossibility theorem (Fischer, Lynch, Paterson, 1985) shows that in a purely asynchronous system — one where message delays have no upper bound — consensus is impossible even if only one process can crash. The intuitive reason: a slow process looks exactly like a crashed one, so there is always some execution where the algorithm cannot determine whether a process has failed or is just delayed, and committing to a decision risks violating agreement in some scenario. Real-world consensus protocols escape this by adding timing assumptions (synchronous or partially synchronous models) or randomization.

The severity of failures also matters. Crash faults (stop-fail) are the gentlest model: failed processes simply go silent. Byzantine faults are the harshest: a faulty process can send contradictory messages to different peers, actively undermining agreement. Algorithms like Paxos and Raft handle crash faults with f < n/2 failures; Byzantine fault-tolerant algorithms require f < n/3, which is why they are reserved for adversarial settings like blockchains.

Understanding the consensus problem is the key to understanding why distributed coordination is hard in practice. Leader election, distributed transactions, and replicated state machines all reduce to consensus. Every time you interact with a system that promises consistency across multiple servers — a distributed database, a payment processor, a coordination service like ZooKeeper — consensus is the invisible foundation making that promise possible.

Practice Questions 3 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Making 10 as an Addition Strategy → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts Through 10 → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Opposites and Additive Inverses → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Introduction to Exponents → Order of Operations → Integer Order of Operations → Variable Expressions → Variables in Logic → Conditional Statements (If-Then Formal) → Converse, Inverse, and Contrapositive → Biconditional Statements → Biconditional Statements and Equivalence → Conditional and Biconditional Statements → Formal Logic and Propositional Calculus → The Consensus Problem

Longest path: 64 steps · 290 total prerequisite topics

Prerequisites (3)

Synchronous vs. Asynchronous Distributed Systemshard Failure Models in Distributed Systemshard Formal Logic and Propositional Calculussoft

Leads To (10)

Byzantine Fault Tolerance and Practical BFThard Distributed Lock Managementhard FLP Impossibility Theoremhard Multi-Master Replicationhard Paxos Consensus Algorithmhard Raft Consensus Algorithmhard State Machine Replicationhard Total Order Broadcast and Strong Consistencyhard Two-Phase Commit Protocolhard View Change and Leader Failover Protocolshard