External memory (EM) algorithms optimize for the memory hierarchy by minimizing the number of block transfers between slow storage (disk) and fast memory (RAM). In the standard model, memory holds M elements, disk blocks hold B elements, and the cost metric is the number of I/O operations (block transfers). Scanning N elements costs Theta(N/B), sorting costs Theta((N/B) * log_{M/B}(N/B)), and searching in a B-tree costs Theta(log_B N). The sorting bound reveals a fundamental separation from the RAM model: the log base is M/B (the number of blocks fitting in memory), not 2, reflecting the ability to merge M/B sorted runs simultaneously. External memory sorting is the foundational primitive — many EM algorithms reduce to sorting, making the sorting I/O bound the de facto baseline.
The RAM model of computation assumes uniform-cost memory access — reading any memory location costs the same. Modern hardware violates this dramatically: L1 cache access takes ~1ns, RAM takes ~100ns, SSD takes ~100μs, and HDD takes ~10ms. The external memory model captures this hierarchy by distinguishing fast memory (size M) from slow storage (block size B), and counting block transfers as the cost measure. This simple two-level model, introduced by Aggarwal and Vitter (1988), yields a rich theory with practical implications for any computation on data too large for RAM.
The foundational result is the external memory sorting bound: Theta((N/B) * log_{M/B}(N/B)) I/Os. Understanding why this is optimal requires seeing what information each I/O provides. Reading a block of B elements reveals their relative order (B! possibilities) among at most M elements in memory — each I/O resolves at most log(B! * C(M,B)) ≈ B * log(M/B) bits of uncertainty about the final ordering. The total uncertainty is log(N!) ≈ N log N bits, so at least N log N / (B log(M/B)) = (N/B) * log_{M/B}(N/B) I/Os are needed. The M/B-way merge sort achieves this: create N/M sorted runs, merge M/B at a time, each pass costs 2N/B I/Os, and log_{M/B}(N/M) passes suffice.
B-trees are the EM analog of balanced binary search trees. Each node contains Theta(B) keys (filling one disk block), and each I/O during a search eliminates a Theta(B)-fraction of the remaining candidates. This gives O(log_B N) search cost instead of O(log_2 N) — with B = 1000, searching a billion keys takes about 3 I/Os instead of 30. B-trees also support efficient range queries (O(log_B N + K/B) for K results) and updates (O(log_B N) amortized). This combination of efficient point queries, range queries, and updates explains why B-trees are the universal data structure for databases and file systems.
Beyond sorting and searching, many EM algorithms follow a "reduce to sorting" paradigm. Graph algorithms (BFS, connected components, MST) achieve optimal I/O bounds by converting graph traversal into a sequence of sorting and scanning operations, avoiding the random access patterns that make naive graph algorithms I/O-inefficient. The external memory model also connects to the streaming model: a streaming algorithm with S bits of memory can be viewed as an external memory algorithm with M = S/B blocks, making one pass over the data. This connection explains why streaming lower bounds and EM lower bounds often use similar techniques.