A node in your distributed system is responding to messages, but sends conflicting state information to different peers. Which failure model does this exemplify?
ACrash failure — the node has stopped and is unresponsive
BOmission failure — the node is silently dropping some messages
CByzantine failure — the node behaves arbitrarily, including sending inconsistent information to different peers
DTiming failure — the node's responses are arriving after the deadline
Byzantine failure is the only model in which a faulty node remains active yet can send incorrect or conflicting messages to different peers. Crash failures produce silence; omission failures drop messages but do not fabricate them; timing failures concern latency, not content. A node that actively lies or sends conflicting data is the defining case of Byzantine behavior.
Question 2 Multiple Choice
A system must tolerate f faulty nodes. How many total nodes are required for Byzantine fault tolerance compared to crash fault tolerance?
AThe same — both require 2f + 1 nodes, since faulty nodes must be outvoted
BMore — Byzantine tolerance requires 3f + 1 nodes, while crash tolerance requires only 2f + 1
CFewer — crash failures are harder to detect and therefore require greater redundancy
Crash fault tolerance requires 2f + 1 nodes so a majority of honest nodes can outvote crashed ones. Byzantine fault tolerance requires 3f + 1 because a Byzantine node can send conflicting votes to different peers — you need enough honest nodes to reach agreement even when f nodes collude and misrepresent. The extra overhead is the direct cost of the more adversarial failure model.
Question 3 True / False
In an asynchronous distributed system, timing failures are the most dangerous failure category because the network provides no delay bounds.
TTrue
FFalse
Answer: False
Timing failures are only defined for synchronous systems — those that specify upper bounds on message delivery and processing time. Asynchronous systems make no timing assumptions, so there are no bounds to violate and timing failures do not exist as a category. Byzantine failures remain the most adversarial type in any model. In asynchronous systems, the inability to distinguish a crashed node from a slow one is a fundamental detectability problem, but it belongs to crash-failure reasoning, not a separate failure class.
Question 4 True / False
A system designed to tolerate Byzantine failures can also correctly handle crash and omission failures, since those are strictly weaker failure modes.
TTrue
FFalse
Answer: True
The failure hierarchy is nested: crash ⊂ omission ⊂ Byzantine. An algorithm that survives Byzantine behavior handles all weaker modes as special cases — a crashed node is simply a Byzantine node that goes permanently silent, and an omission failure is a Byzantine node that drops messages without fabricating them. The reverse is not true: crash-tolerant algorithms can fail completely if Byzantine faults occur, since they assume faulty nodes are silent, not deceptive.
Question 5 Short Answer
Why should a distributed system designer choose the weakest failure model that accurately reflects their threat environment, rather than always designing for Byzantine fault tolerance?
Think about your answer, then reveal below.
Model answer: Because more adversarial failure models require substantially more expensive algorithms. Byzantine fault tolerance demands 3f + 1 nodes (vs. 2f + 1 for crash), extra message rounds, and higher computational cost. If all nodes are in a trusted data center where hardware failures are the only realistic threat, crash-failure assumptions yield simpler, faster protocols with equivalent real-world correctness. Over-engineering for Byzantine threats that cannot occur wastes resources and adds complexity without benefit.
The principle is threat-model alignment: the failure model should match the actual adversarial environment. Byzantine tolerance is essential when participants may be actively malicious (blockchain, multi-party computation with untrusted parties). It is unnecessary and costly when you control all nodes and trust the hardware. Choosing the weakest sufficient model minimizes overhead while maintaining correctness guarantees for the realistic threat surface.