Raft is a consensus algorithm used in production distributed systems. It requires leader election timeouts to function. Why does Raft need timeouts, given that FLP says consensus is impossible with crash failures?
ARaft avoids FLP entirely because it uses a leader-based architecture rather than a leaderless one
BRaft operates under a partial synchrony assumption: timeouts allow the system to assume a slow process has crashed, which is not valid in a purely asynchronous model but works when timing bounds eventually hold
CFLP only applies to systems with more than one crash failure; Raft is designed for single-failure scenarios
DRaft uses randomization to break ties, which exempts it from the FLP impossibility
FLP applies specifically to the purely asynchronous model where messages can be delayed indefinitely. Raft escapes FLP by assuming partial synchrony: eventually, timeouts work reliably (the system is 'eventually synchronous'). When a follower's election timeout expires without hearing from a leader, it treats the leader as failed and starts an election. This timeout-based assumption is invalid under FLP's model (a 'failed' process might just be slow), so Raft sacrifices liveness during periods of high asynchrony — it may fail to elect a leader if the network is too unstable — but guarantees safety always. The existence of timeouts signals the assumption of partial synchrony.
Question 2 Multiple Choice
In a purely asynchronous distributed system, why is it impossible for a process to determine that another process has crashed?
ACrashed processes always send a final 'I am crashing' message before halting, which could be lost
BThe asynchronous model provides no upper bound on message delivery time, so a non-responsive process could be crashed or arbitrarily slow — and there is no way to distinguish these cases
CCryptographic authentication is required to confirm crash detection, which is computationally infeasible in real time
DProcesses in asynchronous systems share memory, so a crashed process would leave memory in a detectable corrupted state
The key property of an asynchronous model is the absence of timing bounds: there is no guaranteed maximum time for a message to be delivered or for a process to respond. If process A sends a message to process B and receives no reply, A cannot conclude B has crashed — B might be alive but experiencing a 10-second delay, a 10-minute delay, or an indefinitely long delay. This observational equivalence between 'crashed' and 'very slow' is what makes crash detection impossible and what FLP exploits to construct executions where the system can always be kept in a state of indecision.
Question 3 True / False
The FLP theorem proves that consensus is impractical in any distributed system with even one crash failure.
TTrue
FFalse
Answer: False
FLP's impossibility is conditional on very specific assumptions: (1) the system is fully asynchronous — no timing bounds on message delivery or processing; (2) the algorithm is deterministic; and (3) at least one process can crash. Real systems escape FLP by relaxing one assumption. Paxos and Raft assume partial synchrony (timeouts eventually work). Randomized algorithms relax determinism. Synchronous systems (with timing guarantees) can solve consensus despite failures. FLP is not a statement about all distributed systems — it is a precise characterization of what is impossible under the pure asynchronous model.
Question 4 True / False
Randomized consensus algorithms can solve the consensus problem in asynchronous systems with crash failures, even under FLP's model.
TTrue
FFalse
Answer: True
FLP applies specifically to deterministic algorithms. The proof works by showing that an adversary can always schedule message deliveries to keep the system in a bivalent (undecided) state. Randomization breaks this by making the adversary's task probabilistic: if processes flip coins to break symmetry, the adversary must get lucky to perpetually prevent a decision. Randomized algorithms like Ben-Or's protocol can guarantee that consensus is reached with probability 1 (almost surely), even though no finite time bound can be guaranteed. This does not violate FLP because randomization — not determinism — is being used.
Question 5 Short Answer
Explain why the inability to distinguish a crashed process from a slow one in an asynchronous model leads directly to the FLP impossibility result.
Think about your answer, then reveal below.
Model answer: FLP's proof constructs scenarios where the adversary controls the order of message delivery. The core argument is that any deterministic consensus algorithm must have a 'bivalent' initial configuration — a state from which either decision value (0 or 1) is still reachable. From any bivalent state, the adversary can always find a single message delivery or non-delivery that keeps the system bivalent, preventing a decision. The adversary exploits the inability to detect crashes: by indefinitely delaying a message from one process, it simulates a crash without actually crashing the process. The algorithm cannot tell the difference, so it cannot safely proceed. Because the adversary can always do this — valid behavior under the asynchronous model — the system can be kept undecided forever, violating liveness.
The philosophical insight is that asynchrony and fault tolerance are fundamentally in tension: asynchrony means you cannot trust silence (a non-responsive process might just be slow), and fault tolerance means you cannot wait forever (a crashed process will never respond). Any algorithm that waits risks waiting forever; any algorithm that doesn't wait risks deciding without consensus. FLP proves these tensions cannot all be resolved simultaneously in the pure asynchronous model.