Questions: Distributed Snapshots and Consistent State Capture
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
In a distributed system, process P1 sends a $100 payment to P2 and then records its local state (balance: $900). P2 records its local state (balance: $500) before receiving the payment. The message is still in transit. What must a correct snapshot algorithm do?
ADiscard the snapshot — any snapshot where a sent message has not yet been received is inconsistent and must be retaken
BAssign the in-transit $100 to P1's recorded state, showing P1 with $1,000
CRecord the in-transit $100 as part of the channel state, so the snapshot captures $900 + $500 + $100 in-flight = consistent total
DRoll back P1's state to before the send, so both processes agree no transfer occurred
A consistent snapshot must account for messages that were sent before one process recorded its state but not yet received when the other process recorded its state. These in-transit messages are captured as the channel state. If we ignored the in-transit $100, the snapshot would show $1,400 in total wealth when there should be $1,500 — money would appear to have vanished. Recording the channel state preserves the invariant and ensures the captured global state could have been a valid intermediate state of the system.
Question 2 Multiple Choice
In the Chandy-Lamport algorithm, when a process receives a marker on a channel for the first time, what does it do?
AIt immediately stops processing all messages until all other processes have sent their markers
BIt records its own local state (if it has not done so already) and begins recording all messages arriving on its other incoming channels until markers arrive on those channels too
CIt forwards the marker only to processes it has sent messages to recently, to minimize overhead
DIt requests a global coordinator to freeze the system and collect all channel states simultaneously
The elegance of Chandy-Lamport is that it is fully decentralized and non-blocking. Upon receiving a marker on channel C for the first time: (1) the process records its local state if it hasn't already, and (2) it begins recording incoming messages on all *other* channels (not C) — these messages were in transit and belong to the channel state. When a marker arrives on each remaining channel, the process stops recording that channel's state. The channel from which the first marker arrived has an empty state, because the process recorded its state exactly when the marker arrived, so no messages on that channel are in the 'snapshot window'.
Question 3 True / False
A consistent distributed snapshot represents the exact global system state at a specific instant in wall-clock time.
TTrue
FFalse
Answer: False
This is the key conceptual subtlety of distributed snapshots. Because processes run independently with no global clock, there is no 'pause button' that freezes every process simultaneously. A consistent snapshot (consistent cut) is a logically valid state that respects causal ordering — if the snapshot includes the effect of an event, it includes all causally preceding events. But this logical cut may never have existed simultaneously in real time; it is a state the system *could have passed through*, which is sufficient for recovery, deadlock detection, and invariant checking, even though it does not correspond to any single wall-clock moment.
Question 4 True / False
Distributed snapshots are useful not only for fault recovery (checkpointing) but also for monitoring live system properties like invariant checking and deadlock detection.
TTrue
FFalse
Answer: True
Snapshots have multiple applications beyond crash recovery. For invariant checking, you can capture the global state and verify that conserved quantities (total money, total message count) are preserved — without stopping the system. For deadlock detection, you analyze the snapshot's resource-wait graph for cycles. For debugging, you can inspect global state at a logical point without interrupting execution. None of these require the snapshot to correspond to a real-time instant — logical consistency is sufficient for all of them.
Question 5 Short Answer
What makes a distributed snapshot 'consistent,' and why is this consistency necessary for the snapshot to be useful for purposes like fault recovery or invariant checking?
Think about your answer, then reveal below.
Model answer: A consistent snapshot satisfies the condition that if it includes the effect of any event, it also includes all causally preceding events (it respects the happened-before partial order). Concretely, it means: if process P1's recorded state shows a message was sent, either P2's recorded state shows it was received, or the message is captured as in-transit in the channel state. Without this, replaying the snapshot could produce contradictions — messages received that were never sent, or resources that appear or disappear. Consistency ensures the captured state is one the system could have legitimately occupied, making it a valid starting point for recovery or analysis.
The happened-before relation (from Lamport clocks) defines causal order. A consistent cut is one where you can draw a line through the event history of each process such that no message crosses the line from right to left — i.e., no message is shown as 'received' in the snapshot without its corresponding 'send' also being in the snapshot (or in the channel state). This is what the marker mechanism achieves: the marker acts as the cut line for each channel, ensuring all pre-marker messages are captured and all post-marker messages are excluded.