Questions: Checkpointing: Sharp and Fuzzy Checkpoints
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
After a fuzzy checkpoint, the database crashes. A junior engineer suggests replaying the log only from the checkpoint record's position, since 'the checkpoint marks a known good state.' What is wrong with this?
ANothing — this is exactly what recovery should do after any kind of checkpoint
BFuzzy checkpoints flush pages in the background while transactions continue modifying other pages, so the on-disk state at checkpoint time is not internally consistent — recovery must consult the checkpoint's dirty page table and active transaction list to find the true minimum recovery LSN
CThe checkpoint record position is not stored in the log, so recovery cannot locate it
DFuzzy checkpoints only apply to read-only transactions, so this procedure would miss updates from write transactions
This is the central distinction between sharp and fuzzy checkpoints. A sharp checkpoint pauses all activity and flushes everything, so the checkpoint record is a clean boundary — safe to replay from. A fuzzy checkpoint writes its record while dirty pages are still being flushed in the background, so some pre-checkpoint pages are on disk while others aren't. Recovery must use the dirty page table (which pages were being flushed and their recovery LSNs) and the active transaction list (which transactions were in-flight) embedded in the checkpoint record to compute the minimum LSN from which redo must start.
Question 2 Multiple Choice
A developer argues: 'Checkpoints are only an optimization for recovery speed — they're not strictly necessary if we're willing to accept slow recovery after a crash.' What critical function does this miss?
ACheckpoints are required for concurrency control — without them, transactions cannot acquire locks
BWithout checkpoints, the write-ahead log must retain every record ever written, because the system can never know which old records might be needed, causing the log to grow without bound
CWithout checkpoints, the buffer pool cannot evict dirty pages to disk
DCheckpoints are required to enforce isolation levels above READ UNCOMMITTED
Log truncation is a second, equally critical function of checkpointing. The WAL guarantees durability only if every change is logged before being applied — but that produces a log that grows indefinitely. A checkpoint establishes a 'minimum recovery LSN': the earliest log record that could possibly be needed for crash recovery. All log records before that point can be safely archived or discarded. Without checkpoints, you can never establish this safe truncation point, and the log grows forever. Production systems process millions of transactions per day — an unbounded log would exhaust disk storage quickly.
Question 3 True / False
A fuzzy checkpoint guarantees that most dirty pages in the buffer pool have been written to disk by the time the checkpoint record appears in the log.
TTrue
FFalse
Answer: False
This describes a sharp checkpoint, not a fuzzy one. The entire point of fuzzy checkpointing is that the dirty page flush happens in the background after the checkpoint record is written, while transactions continue running. The checkpoint record captures which pages were dirty at the start of the flush and the positions of active transactions — it is a snapshot of what needs to be flushed, not a guarantee that flushing is complete. Recovery uses this information to determine what still needs to be applied from the log.
Question 4 True / False
The minimum recovery LSN maintained by a database system determines the oldest log record that must be retained for crash recovery.
TTrue
FFalse
Answer: True
The minimum recovery LSN (sometimes called the low-water mark) is derived from the checkpoint's dirty page table (the oldest recovery LSN of any dirty page that might not be on disk) and the active transaction table (the log position of the oldest active transaction's first record). Log records before this LSN cannot be needed for redo or undo of any in-progress or not-yet-flushed operation. This is what enables safe log truncation — any log record with an LSN below the minimum recovery LSN can be archived or discarded.
Question 5 Short Answer
What information does a fuzzy checkpoint record contain, and why does crash recovery need each piece?
Think about your answer, then reveal below.
Model answer: A fuzzy checkpoint record contains the dirty page table (which pages were dirty at checkpoint start and their recovery LSNs) and the active transaction table (which transactions were in progress and their last log record positions). Recovery needs the dirty page table to identify the earliest LSN from which redo must begin — any dirty page whose recovery LSN is earlier than this must be redone. Recovery needs the active transaction table to identify in-progress transactions that must be undone if they did not commit before the crash. Together, these two tables let the recovery algorithm (such as ARIES) determine the exact minimum LSN to start replaying from, rather than replaying the entire log.
The key is that a fuzzy checkpoint cannot be a clean boundary because pages are still being flushed while the record is written. The dirty page table and active transaction table capture the precise 'in-flight' state at checkpoint time, giving recovery the information needed to distinguish what's already safe on disk from what needs to be reapplied. Without these tables, recovery would have to replay from the beginning of time to be safe.