Cache-Oblivious Algorithms

Research Depth 67 in the knowledge graph I know this Set as goal
cache-oblivious memory-hierarchy van-emde-boas-layout funnel-sort

Core Idea

Cache-oblivious algorithms achieve optimal I/O complexity without knowing the memory parameters M (cache size) and B (block size). They work by being simultaneously efficient for all values of M and B, relying on an ideal cache replacement policy (like LRU) to manage block transfers. The van Emde Boas layout stores a static search tree in memory so that searching costs O(log_B N) I/Os — matching B-trees — without knowing B. Funnelsort achieves the optimal sorting bound O((N/B) * log_{M/B}(N/B)) without knowing M or B. The key design principle is recursive decomposition at all scales: problems are divided into subproblems that fit in cache at some level of the recursion, regardless of the actual cache size. This yields algorithms that are portable across memory hierarchies and simultaneously optimal at every cache level.

Explainer

External memory algorithms optimize for one level of the memory hierarchy by explicitly managing block transfers between cache and disk. But modern systems have 4-5 levels (L1, L2, L3 caches, RAM, disk), each with different M and B parameters. Tuning an algorithm for one level may make it suboptimal for others. Cache-oblivious algorithms sidestep this entirely: they achieve optimal I/O complexity at every level simultaneously, without knowing any of the parameters.

The van Emde Boas tree layout is the clearest illustration of the design principle. A complete binary search tree stored in the standard array layout (BFS or DFS order) incurs O(log_2 N) cache misses per search — because consecutive tree levels are far apart in memory. The vEB layout recursively splits the tree at the midpoint of its height: the top half-tree is stored contiguously, followed by each bottom half-tree contiguously, applied recursively. At whatever recursion depth the subtree size matches the block size B, each subtree fits in one block. A root-to-leaf path crosses O(log N / log B) = O(log_B N) such block-sized subtrees, matching the B-tree's I/O complexity — without the algorithm or data structure knowing B.

Funnelsort extends this principle to sorting. The challenge is that optimal external memory sorting requires M/B-way merging, but a cache-oblivious algorithm cannot know M. Funnelsort introduces a recursive "funnel" data structure: a k-funnel merges k^3 elements from k sorted inputs using a binary tree of sub-funnels. The recursion ensures that at some level, sub-funnels fit entirely in cache and operate without misses, implicitly achieving the optimal merge degree. The result: O((N/B) * log_{M/B}(N/B)) I/Os for all M and B, matching the external memory sorting lower bound universally.

The theoretical foundation rests on the ideal cache model: the analysis assumes an optimal replacement policy (evict the block used farthest in the future), but Sleator-Tarjan's result shows that LRU with 2M memory simulates optimal-M within a constant factor. This makes cache-oblivious results practically relevant — real hardware uses LRU-like policies. The design methodology — recursive decomposition at all scales, so that subproblems fit in cache at some recursion depth regardless of cache size — has been applied to matrix multiplication, FFT, graph algorithms, and priority queues, establishing cache-oblivious design as a general and powerful paradigm.

Practice Questions 4 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesLiteral EquationsSlope-Intercept FormPoint-Slope FormWriting Linear EquationsParallel and Perpendicular Line SlopesGraphing Linear EquationsPiecewise FunctionsStep FunctionsComposition of FunctionsInverse FunctionsRadical Functions and GraphsRational ExponentsExponential Functions and GraphsLogarithms IntroductionBig-O Notation and Asymptotic AnalysisAlgorithm Analysis and Complexity ClassesDivide-and-Conquer and the Master TheoremDivide and ConquerMerge SortExternal Memory AlgorithmsCache-Oblivious Algorithms

Longest path: 68 steps · 367 total prerequisite topics

Prerequisites (3)

Leads To (0)

No topics depend on this one yet.