External memory merge sort creates sorted runs of size M (filling RAM), then merges M/B runs at a time. Why is the number of merge passes Theta(log_{M/B}(N/M)), yielding total I/O cost Theta((N/B) * log_{M/B}(N/B))?
AEach merge pass reads all N/B blocks and writes N/B blocks, so costs 2N/B I/Os. With M/B-way merging, the number of runs decreases by factor M/B each pass. Starting from N/M runs, log_{M/B}(N/M) passes suffice. Since log_{M/B}(N/M) = log_{M/B}(N/B) - 1 + log_{M/B}(B/M) ≈ log_{M/B}(N/B), total I/O is Theta((N/B) * log_{M/B}(N/B))
BEach pass halves the number of runs, giving O(log_2 N) passes
CThe I/O cost is N * log N / B because each comparison costs one I/O
DExternal sort always requires N^2/B I/Os
The key insight is that M/B-way merging (keeping one block from each of M/B runs in memory simultaneously) reduces the run count by factor M/B per pass, not factor 2 as in standard merge sort. This logarithmic base difference is dramatic: with M = 4GB, B = 4KB, M/B = 10^6, sorting a terabyte requires only about 2 passes instead of ~30. The proof of optimality uses an adversary argument counting the number of distinct orderings that can be resolved per I/O operation.
Question 2 True / False
In the external memory model, scanning N consecutive elements costs N/B I/Os while accessing N random elements costs N I/Os. This N/B vs N gap is the fundamental reason external memory algorithms differ from RAM algorithms.
TTrue
FFalse
Answer: True
A sequential scan reads elements B at a time (one I/O per block), costing N/B. Random access reads one useful element per I/O, costing N. The ratio B (typically 512-4096) means sequential access is 3 orders of magnitude cheaper than random access. This gap drives all EM algorithm design: algorithms must maximize sequential access patterns and minimize random accesses. B-trees achieve O(log_B N) search cost (instead of O(log_2 N) for binary trees) precisely because each node occupies one block, and each I/O eliminates a factor-B of the search space instead of factor-2.
Question 3 Short Answer
Explain why the external memory sorting lower bound Omega((N/B) * log_{M/B}(N/B)) cannot be achieved by comparison-based RAM sorting algorithms without explicit I/O optimization.
Think about your answer, then reveal below.
Model answer: A standard RAM sorting algorithm like quicksort or mergesort achieves O(N log N) comparisons, but its I/O behavior depends on memory access patterns, not comparison count. Quicksort's random pivoting causes random memory accesses that waste most of each block transfer — empirically, it performs close to N random I/Os = Theta(N) I/Os rather than the optimal Theta((N/B) * log_{M/B}(N/B)). Standard mergesort uses 2-way merging, giving O(N/B * log_2(N/B)) I/Os — the log base is 2 instead of M/B, a massive difference. Only M/B-way mergesort, explicitly designed for the EM model, achieves the optimal I/O count by maximizing the use of all M/B blocks in memory during each merge pass.
This illustrates a general principle: optimal RAM algorithms are NOT automatically optimal for external memory. The I/O model is a different computational model with different bottlenecks, and algorithms must be designed specifically for it. The cache-oblivious model (covered separately) addresses this by designing algorithms that are simultaneously optimal across all memory hierarchy parameters.
Question 4 True / False
The external memory model requires algorithms to explicitly manage block transfers between memory levels. The cache-oblivious model removes this requirement.
TTrue
FFalse
Answer: True
In the external memory model, the algorithm knows M (memory size) and B (block size) and explicitly decides which blocks to transfer. In the cache-oblivious model, the algorithm does not know M or B and cannot explicitly manage transfers — instead, an optimal paging strategy (like LRU) manages the cache, and the algorithm's I/O complexity is analyzed for all M and B simultaneously. Cache-oblivious algorithms that match external memory optimal bounds exist for sorting, searching, and many other problems. The advantage is portability: one algorithm is optimal across all levels of the memory hierarchy without tuning parameters.