A process calls write() to save a file and the call returns successfully. Two seconds later, the system crashes unexpectedly. Will the data be on disk when the system restarts?
AYes — write() returning means the data was committed to disk
BNo — with write-back caching, write() only puts data in the kernel buffer cache; it is lost if the buffer was never flushed to disk
CYes — the kernel always flushes dirty buffers before returning from write()
DNo — write() never guarantees durability regardless of caching policy
This is the key durability trap of write-back caching. write() returning successfully means the kernel accepted the data into its buffer cache and marked the buffer dirty. The actual disk write happens later, asynchronously, by background flush threads. If the system crashes before the flush, the data in dirty buffers is gone. To guarantee durability, an application must call fsync() after writing, which forces the kernel to flush dirty buffers to disk before returning.
Question 2 Multiple Choice
What does it mean for a kernel buffer to be 'dirty'?
AThe buffer contains data that has been corrupted or is no longer valid
BThe buffer holds data that has been written by a process but not yet flushed to the underlying storage device
CThe buffer was recently evicted and its data is now only on disk, not in memory
DThe buffer is currently being read from disk into the cache
A dirty buffer is one whose in-memory contents have been modified (written to) but whose counterpart on disk has not yet been updated to match. The kernel marks buffers dirty on write-back to track which blocks need to be flushed to disk. Clean buffers, by contrast, exactly mirror what is on disk and can be evicted without any disk write. This dirty/clean distinction is the core bookkeeping mechanism of write-back caching.
Question 3 True / False
If a file's blocks are in the kernel buffer cache, a process's read() call for that file will not generate any disk I/O.
TTrue
FFalse
Answer: True
This is exactly the purpose of the buffer cache. On a cache hit, the kernel copies the requested data from main memory (the cache) to the process's address space — no disk access occurs. This is why programs that repeatedly read the same files (config files, shared libraries) do not produce repeated disk I/O. The first read (a cache miss) fetches from disk; all subsequent reads hit the cache and are served at memory speed.
Question 4 True / False
Write-back caching provides stronger data durability guarantees than write-through caching because the kernel takes responsibility for managing the writes.
TTrue
FFalse
Answer: False
Write-back offers worse durability than write-through, not better. Write-through writes data to both cache and disk immediately — every write blocks until the disk confirms, so if the system crashes, no acknowledged write is lost. Write-back defers disk writes, so data in dirty buffers can be lost on a crash. Write-back is dramatically faster (the application isn't waiting for disk), but that performance comes at the cost of a durability window. Databases use fsync() precisely to close this window for critical commits.
Question 5 Short Answer
Why do database systems typically call fsync() after committing a transaction, even though it significantly slows write throughput?
Think about your answer, then reveal below.
Model answer: fsync() forces the kernel to flush all dirty buffers for the file to disk before returning, guaranteeing the data survives a subsequent crash. Without it, a committed transaction might exist only in the kernel buffer cache and be lost if the system crashes before the background flush thread runs.
Databases must guarantee durability — the 'D' in ACID. A transaction marked committed must survive crashes. With write-back caching, write() returning just means data reached the kernel cache; it could still be lost. fsync() closes this gap by synchronously flushing dirty buffers. The throughput cost is real, which is why databases batch multiple transactions and call fsync() once per batch (group commit), rather than once per transaction.