← Graph View All Domains

A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

State Machine Replication

Research Depth 101 in the knowledge graph ☐ I know this ☆ Set as goal

4topics build on this

424prerequisites beneath it

See this on the map →

Paxos Consensus Algorithm The Consensus Problem +3 more→→Multi-Master Replication Quorum-Based Replication +1 more

Core Idea

State machine replication replicates a deterministic service by using consensus to agree on a command sequence. All replicas execute identical commands in identical order, producing identical outputs. If f replicas fail, the system survives using consensus for f < n/2. SMR achieves linearizability by having consensus order all operations.

Explainer

You already understand the consensus problem: getting a group of nodes to agree on a single value despite failures. State machine replication (SMR) takes that idea and applies it repeatedly, turning consensus from a one-shot agreement into a continuous mechanism for keeping multiple copies of a service perfectly synchronized.

The core principle is deceptively simple. A deterministic state machine is any system where, given the same starting state and the same sequence of inputs, you always get the same ending state and outputs. A key-value store is a good example: if you start with an empty store and apply "SET x=1" then "SET y=2" then "DELETE x," every copy that processes those commands in that exact order will end up in the same state — just {y: 2}. SMR exploits this by running the same state machine on multiple nodes and using consensus to ensure every node processes the same commands in the same order. If the state machine is deterministic and the input sequence is identical, the replicas are guaranteed to stay in lockstep.

In practice, SMR works by assigning each client request a log position (slot number) through consensus. A client sends a request — say, "SET x=5" — to the system. The replicas run a consensus protocol (like Paxos or Raft) to agree that this request occupies slot 47 in the shared log. Once consensus is reached, every replica executes the command at slot 47 and moves on to slot 48. The log is the single source of truth for ordering. Even if messages arrive at different replicas in different orders, consensus ensures they all agree on what goes in each slot. This is why the consensus prerequisite is essential — without it, you cannot build the ordered log that SMR depends on.

The fault tolerance guarantee follows directly. If you have 2f + 1 replicas, up to f can crash and the system continues operating. The surviving f + 1 nodes still form a majority, so consensus can still make progress. When a crashed replica recovers, it simply replays the log from where it left off, reapplying each command in order until it catches up with the others. Because the state machine is deterministic, replay produces the exact same state as if the replica had never crashed. This combination — consensus for ordering, determinism for consistency, and log replay for recovery — is what makes SMR the foundational technique behind virtually every strongly consistent replicated system, from replicated databases to distributed lock services like Chubby and ZooKeeper.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Introduction to Exponents → Order of Operations → Integer Order of Operations → Variable Expressions → The Distributive Property → Variables and Expressions Review → Introduction to Polynomials → Adding and Subtracting Polynomials → Multiplying Polynomials → Factorial → Permutations → Combinations → Counting Principles: Addition and Multiplication Rules → Introduction to Graph Theory → Propositional Logic Foundations → Logical Equivalences → Boolean Algebra → Boolean Type and Truth Values → Comparison Operators and Boolean Tests → Logical Operators and Boolean Algebra → Boolean Algebra and Fundamental Laws → Logic Gates Fundamentals → Implementing Boolean Functions with Gates → Karnaugh Map Simplification → Combinational Circuit Design → Flip-Flops and Latches → Binary Counters: Design and Analysis → Binary Arithmetic → Fixed-Point Number Representation → Two's Complement Representation → Overflow and Underflow Detection → Binary Adders: Half-Adders and Full-Adders → Full Adder and Carry Propagation → Carry Lookahead Adder Design → Half Adder Circuit Design → Multiplication Circuit Design → Sequential Circuit Design → Registers and Register Files → Instruction Set Architecture (ISA) → Kernel Architecture and OS Structure → System Calls and User/Kernel Mode → Processes and the Process Control Block → Logical Clocks and Event Ordering → Vector Clocks and Capturing Causality → Happened-Before Relation and Causal Ordering → Consistency Models in Distributed Systems → Read-After-Write Consistency → Sequential Consistency → Linearizability → State Machine Replication

Longest path: 102 steps · 424 total prerequisite topics

Prerequisites (5)

The Consensus Problemhard Paxos Consensus Algorithmhard Linearizabilitysoft Hinted Handoff Recoverysoft Raft Consensus Algorithmsoft

Leads To (3)

Multi-Master Replicationhard Quorum-Based Replicationsoft View Change and Leader Failover Protocolssoft