Questions: Failure Detection with Heartbeats

5 questions to test your understanding

Score: 0 / 5
Question 1 Multiple Choice

A team sets their heartbeat timeout to 100ms to detect node failures as quickly as possible. Their operations team starts receiving many 'node down' alerts that resolve within seconds. What is the most likely root cause?

AThe nodes are actually crashing and recovering — 100ms is appropriate for detecting this
BThe timeout is too short: brief network congestion or processing spikes cause heartbeats to arrive late, triggering false positives
CThe heartbeat interval itself should be shortened to match the timeout
DGossip-based heartbeats would eliminate this problem entirely without any tradeoff
Question 2 Multiple Choice

A node in a distributed system hasn't received a heartbeat from a peer for three times the normal interval. The network is experiencing unusual congestion. What can the observing node definitively conclude?

AThe peer has crashed and should be marked as failed immediately
BThe peer is experiencing a Byzantine failure and may be sending corrupt messages
CNothing definitive — in an asynchronous network, missing heartbeats cannot distinguish a crashed node from a very slow one
DThe peer has definitely not crashed but needs its heartbeat interval increased
Question 3 True / False

In a fully synchronous network where message delivery time is bounded by a known maximum delay δ, it is theoretically possible to build a perfect failure detector.

TTrue
FFalse
Question 4 True / False

Heartbeat-based failure detection can definitively identify whether a node has crashed or is merely slow, as long as the timeout is calibrated correctly.

TTrue
FFalse
Question 5 Short Answer

Why do systems like Cassandra use phi accrual failure detectors rather than simple binary timeout-based heartbeats?

Think about your answer, then reveal below.