A 5-stage pipelined processor and a non-pipelined processor each execute the same single instruction. Assuming no hazards, which processor takes longer to complete that one instruction?
AThe non-pipelined processor — it must complete the full 5-stage path sequentially
BThe pipelined processor — each stage boundary adds pipeline register overhead, making the total latency slightly longer
CThey take exactly the same time — pipelining only affects throughput, leaving latency unchanged
DThe pipelined processor is always faster for a single instruction because its clock frequency is higher
Pipelining adds pipeline register overhead at each stage boundary — the registers that hold intermediate values consume additional time. So a single instruction takes slightly *longer* on a pipelined processor than a non-pipelined one. Pipelining's benefit is entirely in throughput: many instructions executing simultaneously. For a single instruction in isolation, pipelining is a slight disadvantage. Option C is nearly correct — pipelining does not improve individual instruction latency — but the strictly accurate answer is that pipelined latency is marginally worse, not equal.
Question 2 Multiple Choice
A processor designer considers deepening the pipeline from 5 stages to 15 stages to allow a higher clock speed. Which statement best captures the trade-off?
AA deeper pipeline is always better — more stages means a faster clock and proportionally higher throughput
BA deeper pipeline increases throughput by 3× because 15 stages is 3× deeper than 5 stages
CA deeper pipeline enables a faster clock but increases hazard penalties, since each stall flushes more pipeline work
DA deeper pipeline decreases throughput because each instruction takes more clock cycles to complete
Deeper pipelines allow a faster clock (each stage does less work per cycle), but they amplify the cost of hazards. In a 5-stage pipeline, a data hazard stall might waste 2 cycles; in a 15-stage pipeline, the same dependency might waste 6–8 cycles. Branch mispredictions also flush more in-flight work. Net performance depends on whether the clock speedup outpaces the increased hazard penalty — which is not guaranteed, as Intel's Pentium 4 (31 stages) demonstrated with diminishing returns.
Question 3 True / False
Pipelining reduces the time (latency) required to execute each individual instruction.
TTrue
FFalse
Answer: False
Pipelining does not reduce — and actually slightly increases — individual instruction latency, because pipeline register overhead is added at each stage boundary. Pipelining's benefit is entirely in throughput: by overlapping execution of many instructions, one instruction completes per clock cycle after the pipeline fills. The laundry analogy makes this clear: each individual load still takes 90 minutes (latency unchanged); the improvement is finishing one load every 30 minutes instead of every 90 (throughput tripled).
Question 4 True / False
RISC architectures are better suited to pipelining than CISC architectures partly because their fixed-length instructions make the fetch stage predictable and their uniform formats simplify decoding.
TTrue
FFalse
Answer: True
Fixed-length instruction encoding means the fetch stage always knows exactly how many bytes to read for the next instruction — no variable-width parsing required. Uniform instruction formats mean register specifiers are always in the same bit positions, making the decode stage simple and fast. These properties allow pipeline stages to do a consistently-sized amount of work, which is essential for keeping stages balanced. CISC architectures like x86, with instructions ranging from 1 to 15 bytes, require complex pre-decode logic just to find instruction boundaries.
Question 5 Short Answer
Using the laundry analogy, explain why pipelining improves throughput but not latency.
Think about your answer, then reveal below.
Model answer: Each load still goes through wash (30 min), dry (30 min), and fold (30 min) — 90 minutes total per load. Pipelining doesn't make any single load finish faster; it starts the next load as soon as the previous one moves to the next stage. After the pipeline fills, you complete one load every 30 minutes (the bottleneck stage time) instead of every 90. Throughput triples, but each load tracked from start to finish still takes 90 minutes. Latency per load is unchanged; throughput is dramatically improved.
In CPU terms: each instruction still passes through all 5 stages and takes the same total path time. Pipelining adds nothing to individual instruction speed — it ensures different stages work on different instructions simultaneously. This distinction matters because pipelining does not help with single-instruction latency (relevant in dependency chains) and explains why it is a throughput optimization, not a latency optimization.