Questions: Cache Line Organization and Byte Offset
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
Two threads on different CPU cores write frequently to different variables stored 4 bytes apart in memory, within the same 64-byte cache line. Performance is mysteriously poor despite no shared data. What is the most likely cause?
AA data race — the threads are inadvertently accessing the same variable.
BFalse sharing — each core's write invalidates the entire cache line on the other core, even though they write to different bytes.
CCache thrashing — the two variables map to the same cache set, causing repeated tag evictions.
DAlignment error — the variables straddle word boundaries, causing split-register operations.
False sharing occurs when threads write to different bytes that share a cache line. Cache coherence protocols operate at the granularity of entire cache lines: when Core A writes to its variable, the whole 64-byte line is marked as modified by Core A, and Core B's copy is invalidated. When Core B writes to its variable (same line), it must first fetch the updated line from Core A, then invalidate Core A's copy. This line bouncing between cores creates enormous overhead with no actual sharing of data — purely an artifact of co-location within a line. The fix is to pad variables to separate cache lines.
Question 2 Multiple Choice
For a 32-bit address with a 16 KB direct-mapped cache using 64-byte lines, how many bits are used for the offset, index, and tag respectively?
AOffset: 8, Index: 6, Tag: 18
BOffset: 6, Index: 8, Tag: 18
COffset: 6, Index: 14, Tag: 12
DOffset: 4, Index: 10, Tag: 18
64-byte lines require log₂(64) = 6 offset bits to address every byte within a line. 16 KB / 64 bytes = 256 lines, requiring log₂(256) = 8 index bits. The remaining 32 − 6 − 8 = 18 bits form the tag. This bit decomposition is the mechanism by which the hardware locates a cached address in O(1): the index selects the cache set directly (no search), and the tag disambiguates among the many memory locations that map to that set. The offset then selects the specific byte from the matched line.
Question 3 True / False
When a cache line is loaded on a miss, subsequent accesses to any other byte within that same line will be cache hits, requiring no additional memory fetches.
TTrue
FFalse
Answer: True
This is spatial locality exploitation in action. When a cache line is loaded, all bytes in that line are stored in the cache together. Any access to any byte within the line finds it already present — a hit — as long as the line has not been evicted. Sequential iteration through an array is cache-friendly for exactly this reason: after the first element of each 64-byte block is accessed (potentially a miss), all subsequent elements in the same block are hits. The cache line is the atomic unit of allocation, so all bytes within it rise and fall together.
Question 4 True / False
Storing a variable in a smaller data type (e.g., char instead of double) guarantees it occupies fewer cache lines and will typically improve cache performance.
TTrue
FFalse
Answer: False
A smaller variable does not necessarily span fewer cache lines — alignment determines this, not size alone. A 1-byte variable placed at a cache-line boundary crossing spans two lines just as a larger misaligned variable would. Additionally, packing many small variables into shared cache lines can cause false sharing in multithreaded code, worsening performance. Smaller size reduces space used within a line, but without proper alignment and layout awareness, it provides no guarantee of better cache behavior.
Question 5 Short Answer
Explain why caches fetch an entire cache line rather than just the single byte requested, and what assumption about memory access patterns this design exploits.
Think about your answer, then reveal below.
Model answer: Caches fetch entire lines (typically 64 bytes) because of spatial locality: programs that access address A are very likely to soon access nearby addresses A+1, A+2, etc. Bringing in the whole line on a miss means subsequent nearby accesses are already in the cache, converting future misses into hits at no additional cost. The underlying assumption is that programs access memory in clusters — walking arrays, reading struct fields, executing sequential instructions — rather than jumping randomly across the address space.
The alternative — fetching a single byte — would waste the bandwidth opportunity that cache lines exploit. DRAM transfers have high latency per request but high throughput once a transfer starts. Fetching 64 bytes costs only marginally more than fetching 1 byte (latency dominates, not transfer time), so the cache amortizes the fixed latency cost over 64 bytes rather than 1. Spatial locality ensures most of those 64 bytes will be needed soon, making the larger fetch worthwhile. When spatial locality fails (random access in sparse data structures), lines go partly wasted — a key motivation for designing cache-friendly data layouts.