B-Trees and Multi-Way Search Trees

College Depth 61 in the knowledge graph I know this Set as goal
Unlocks 15 downstream topics
b-trees multi-way external-storage database-indexes disk-based

Core Idea

A B-tree of degree m is a multi-way search tree where internal nodes have 2 to m children and store multiple keys, minimizing height. Each disk read retrieves an entire node, making B-trees ideal for external storage (databases, file systems). Height is O(log_m n), which is dramatically smaller than binary trees for large m.

How It's Best Learned

Trace insertions and splits on a B-tree of order 3 by hand. Understand why the branching factor reduces height (crucial for disk-based systems). Implement insertion with node splitting and the median key 'bubbling up' to parent nodes.

Common Misconceptions

Explainer

From binary trees, you know that search performance depends on tree height — every level you descend costs one comparison. A binary search tree with n keys has height O(log₂ n), which is fine when every comparison is cheap. But when your data lives on disk rather than in memory, each node access means a disk read, and disk reads are roughly 100,000 times slower than memory accesses. A binary tree with a million keys has height ~20, meaning 20 disk reads per search. A B-tree solves this by making nodes wide instead of tall: each node stores dozens or hundreds of keys with correspondingly many children, dramatically reducing height.

A B-tree of order m (also called degree or branching factor m) allows each internal node to have up to m children and store up to m−1 keys. Keys within a node are sorted, and the children between them point to subtrees containing keys in the corresponding range — just like a binary search tree, but with multiple partitions per node instead of two. The minimum occupancy rule ensures nodes stay at least half full: internal nodes must have at least ⌈m/2⌉ children. This guarantee keeps the tree balanced and prevents degenerate chains. The height of a B-tree with n keys is O(log_m n), so with m = 1000, a tree holding a billion keys is only 3 levels deep — 3 disk reads versus 30 for a binary tree.

Insertion works by searching for the correct leaf, inserting the key, and splitting if the leaf overflows. When a node exceeds m−1 keys, it splits into two nodes, and the median key is promoted to the parent. If the parent also overflows, the split propagates upward — in the worst case, all the way to the root, which splits to create a new root and increases the tree height by one. This bottom-up splitting is why B-trees grow at the top, not the bottom, and why they stay perfectly balanced: every leaf is always at the same depth. Deletion is the mirror image, merging or redistributing keys when nodes become too empty.

The reason B-trees dominate in databases and file systems comes down to matching the data structure to the hardware. A disk read fetches an entire block (typically 4–16 KB) regardless of how much data you need from it. By sizing B-tree nodes to match the disk block size, each read brings in hundreds of keys at once — you binary-search within the node in memory (fast), then follow one pointer to the next level (one more disk read). This is why the branching factor m is chosen to be as large as a disk block can hold. The result is that B-tree operations cost O(log_m n) disk I/Os, and since m is large, the practical number of reads for even massive datasets is tiny. Every relational database index you have ever used is built on a variant of this idea.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesLiteral EquationsSlope-Intercept FormPoint-Slope FormWriting Linear EquationsParallel and Perpendicular Line SlopesGraphing Linear EquationsSystems of Equations — Graphing MethodSystems of Equations — Elimination MethodSystems of Three VariablesMatrices IntroductionGraph Representation: Matrices and ListsGraph Paths, Cycles, and ConnectivityTrees and Spanning TreesBinary TreesB-Trees and Multi-Way Search Trees

Longest path: 62 steps · 313 total prerequisite topics

Prerequisites (2)

Leads To (3)