A database has a composite B+ tree index on (last_name, first_name). A query filters only on first_name. Will the index speed up this query?
AYes — the index covers both columns, so any query touching either column can use it
BNo — the B+ tree is sorted by last_name first; without a last_name filter, the index cannot narrow the search and a full index scan is needed
CYes — the leaf linked list allows the database to scan first_name values efficiently in sorted order
DNo — composite indexes never improve query performance; only single-column indexes do
A composite index sorts by the leftmost column first, exactly like a phone book sorted by last name. If you don't provide a last_name filter, you can't jump to the right part of the tree — you'd have to scan every leaf. This is the 'leftmost prefix rule': the index accelerates queries that filter on (last_name), (last_name, first_name), but not on (first_name) alone. Understanding this is essential for designing indexes that actually help rather than just consuming disk space.
Question 2 Multiple Choice
Why does a B+ tree with branching factor 500 reach billion-row tables in only 4–5 levels, while a binary search tree would need roughly 30+ levels for the same data?
AB+ trees use a different sorting algorithm than binary trees, making each comparison more powerful
BEach B+ tree level eliminates 500 candidates (not 2), so 500⁴ ≈ 62.5 billion rows fit within 4 levels; a binary tree requires log₂(n) levels, which is ~30 for a billion rows
CB+ trees cache their upper levels in memory, making higher levels invisible to the performance calculation
DBinary trees require more disk space per node, forcing more levels
The branching factor is the key insight. At each level of a B+ tree, you eliminate 1/500th of remaining candidates rather than 1/2nd. Height grows as log₅₀₀(n) vs log₂(n). For n = 1 billion: log₂(1B) ≈ 30 levels; log₅₀₀(1B) ≈ 3.5 levels. Each level = one disk read, so this difference between 4 reads and 30 reads is enormous. The branching factor is why B-trees were designed: packing many keys per node matches the page-based I/O model of disks.
Question 3 True / False
In a B+ tree as used by relational databases, data records are stored in both internal (non-leaf) nodes and leaf nodes.
TTrue
FFalse
Answer: False
This is the defining characteristic of B+ trees (as opposed to plain B-trees). Internal nodes store only keys and child pointers — they function purely as a routing structure. All actual data records live exclusively in the leaf nodes. The leaves are then linked together as a sorted doubly-linked list, enabling efficient range scans. Keeping data only in leaves means internal nodes can hold more routing keys, maximizing branching factor.
Question 4 True / False
A B+ tree index efficiently supports range queries (e.g., WHERE age BETWEEN 25 AND 40) because its leaf nodes are linked in sorted order.
TTrue
FFalse
Answer: True
This is one of B+ trees' key advantages over hash indexes, which only support exact-match lookups. For a range query, the database traverses the tree to find the first matching leaf, then follows the leaf-level linked list forward until it passes the upper bound — no further tree traversal needed. This makes B+ trees the standard choice for any column that appears in range conditions, ORDER BY clauses, or inequality filters.
Question 5 Short Answer
Why are B-trees used for database indexes instead of binary search trees? What property of storage systems drives this design choice?
Think about your answer, then reveal below.
Model answer: Databases store data on disk, where the unit of access is a page (typically 4–16 KB). Each disk read is far slower than memory access. A binary tree node holds one key, wasting almost an entire page per read. B-trees pack hundreds of keys into each node (= each page), so one disk read eliminates hundreds of candidates rather than just one. This high branching factor reduces tree height to 3–5 levels for any practical table size, meaning at most 3–5 disk reads per lookup.
The design is driven by the I/O cost model: memory accesses are nanoseconds; disk reads are milliseconds — a million times slower. Any data structure that minimizes the number of disk accesses wins. B-trees are optimal for this model because they are designed around page size rather than individual keys. The insight is that the cost bottleneck is the number of disk reads, not CPU comparisons — so packing more work into each read is the right optimization.