In a fully asynchronous distributed system, Process A sends a message to Process B and receives no response after 10 seconds. What can Process A conclude?
AProcess B has crashed — 10 seconds is too long for any functioning process
BThe network has partitioned between A and B
CNothing definitive — B may be slow, the message may be in transit, or B may have crashed
DProcess B received the message but chose not to respond
This is the fundamental problem of the asynchronous model: without timing bounds, a slow process and a crashed process are indistinguishable from the outside. There is no timeout threshold you can set that definitively means 'crashed' rather than 'slow' — because asynchrony allows arbitrarily long delays. This indistinguishability is why many problems (including consensus) are provably impossible in the pure asynchronous model. It is also why real distributed systems must either accept uncertainty or impose additional assumptions about timing.
Question 2 Multiple Choice
Why do practical consensus algorithms like Raft guarantee safety (no incorrect decisions) at all times but only guarantee liveness (making progress) when the system is behaving well?
AThey are designed for the synchronous model, where safety and liveness are both always guaranteed
BThey are designed for partial synchrony — safety holds under any timing, but liveness requires eventual synchrony for timeouts to function correctly
CThey trade safety for liveness during high-load periods to improve throughput
DLiveness is impossible in any distributed system, so algorithms don't guarantee it
Raft and Paxos are designed for the partially synchronous model, which captures real-world systems that are usually well-behaved but occasionally suffer bursts of delay. Safety (e.g., never electing two leaders simultaneously) is maintained under any timing conditions through quorum requirements and term numbers. Liveness (electing a leader and making progress) requires that the system eventually becomes synchronous enough for election timeouts to work reliably. During prolonged asynchrony (severe network partition, cascading delays), the system may halt rather than risk an incorrect decision.
Question 3 True / False
In the synchronous distributed model, failure detection is reliable because you can use timeouts to definitively conclude that a non-responding process has crashed.
TTrue
FFalse
Answer: True
This is the key advantage of the synchronous model. Because there is a known upper bound on message delivery time and process step time, if no response arrives within that bound, the process must have failed — there is no other explanation. This makes failure detection clean and consensus algorithms straightforward. Contrast with the asynchronous model, where any delay could be arbitrarily long, making timeout-based failure detection unreliable: a 'suspected' process might simply be slow.
Question 4 True / False
The partially synchronous model is merely a theoretical curiosity with no relevance to real distributed systems.
TTrue
FFalse
Answer: False
Partial synchrony is the most practically relevant model. It captures the reality of systems like the internet, cloud infrastructure, and data center networks: most of the time, messages arrive within milliseconds (behaving synchronously), but occasionally there are spikes of delay from network congestion, garbage collection pauses, or load spikes. The most widely deployed consensus protocols — Paxos, Raft, Zab (used in ZooKeeper) — are all designed for this model. Understanding partial synchrony is essential for understanding why these systems sometimes stall during network disruptions but recover once conditions improve.
Question 5 Short Answer
Why does the choice of computational model (synchronous vs. asynchronous vs. partially synchronous) matter for distributed algorithm design?
Think about your answer, then reveal below.
Model answer: The model defines what assumptions you can rely on — specifically, whether you can set reliable timeouts, whether you can distinguish slow processes from crashed ones, and whether clocks are synchronized. These assumptions determine what problems are solvable and which correctness guarantees are achievable. A problem that is easily solved in the synchronous model (like failure detection) may be provably impossible in the asynchronous model. Every algorithm and impossibility theorem comes with a model attached — claiming a result without specifying the model is incomplete.
The classic example is the FLP impossibility result: consensus is impossible in a fully asynchronous system even with just one crash fault. This sounds alarming but is a consequence of the asynchronous model's strongest assumption — that no timing bounds exist. In the synchronous model, consensus is straightforward. In partial synchrony, it is achievable when the system stabilizes. Without understanding these models, you cannot interpret impossibility results correctly, and you cannot reason about whether a real system's behavior violates or is consistent with theoretical guarantees.