A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

CPU Performance Metrics and Amdahl's Law

College Depth 102 in the knowledge graph ☐ I know this ☆ Set as goal

3topics build on this

492prerequisites beneath it

CPU Pipelining Time and Space Complexity→→Performance Metrics, Power, and Thermal Management Superscalar and VLIW Processors

Core Idea

CPU performance is quantified by the equation: CPU time = Instruction count × CPI × Clock cycle time. Improving any one factor improves performance, but trade-offs exist — reducing CPI with more pipeline stages may require a faster clock with shorter cycle time. Amdahl's Law states that the speedup from improving a fraction f of execution is bounded by 1/(1−f), meaning serial portions of code form a hard ceiling on total speedup. Benchmarks like SPEC measure real-world application performance rather than synthetic peak numbers, providing honest cross-architecture comparisons.

How It's Best Learned

Compute execution time using the CPU time equation under different clock speeds and CPI assumptions. Apply Amdahl's Law to predict speedup from parallelizing 80% of a workload. Compare SPEC scores across processor generations to observe the effect of architectural improvements.

Common Misconceptions

A higher clock frequency does not always mean better performance; if CPI increases proportionally, execution time does not improve.
Amdahl's Law applies to any optimization, not just parallelism — the unoptimized fraction always limits the maximum achievable speedup.

Explainer

From your work with pipelining and algorithm complexity, you already have intuitions about what makes computation fast or slow. Performance and benchmarking formalizes these intuitions into a precise framework. The central equation is deceptively simple: CPU time = Instruction count × CPI × Clock cycle time. Every architectural decision maps onto one or more of these three factors. A better compiler reduces instruction count. A wider pipeline or out-of-order execution reduces CPI (cycles per instruction). A smaller transistor process enables a faster clock, reducing cycle time. The equation makes explicit that improving one factor while worsening another can result in no net gain — or even a regression.

CPI deserves special attention because it is not a fixed number — it is an average across all instructions in a program. A load instruction that hits in L1 cache might take 1 cycle, while the same load missing to main memory might stall for 200 cycles. A program dominated by cache-friendly arithmetic will have a CPI near 1 (or below 1 on a superscalar processor), while a pointer-chasing traversal of a linked list might have an effective CPI of 50 or more. This is why clock speed alone is misleading: a 4 GHz processor with CPI of 5 is slower than a 2 GHz processor with CPI of 1 on the same instruction count.

Amdahl's Law provides a ceiling on optimization. If you can speed up 90% of a program's execution by a factor of 10, the overall speedup is not 10× — it is only about 5.3×, because the remaining 10% is unchanged and now dominates execution time. The formula is Speedup = 1 / ((1 − f) + f/S), where f is the fraction improved and S is the speedup of that fraction. The practical lesson is harsh: optimizing what is already fast is nearly worthless. Profiling to find the actual bottleneck is always the first step, because Amdahl's Law guarantees that effort spent on non-bottleneck code yields diminishing returns.

Benchmarks exist because no single metric captures performance across all workloads. MIPS (millions of instructions per second) is seductive but flawed — it varies by program, ignores instruction complexity, and can even increase when a compiler generates more but simpler instructions. The SPEC benchmark suites (SPECint, SPECfp) address this by running standardized real-world programs and reporting geometric means of speedup ratios relative to a reference machine. When you see a processor advertised as "20% faster," the right question is always: faster on what benchmark, and does that benchmark resemble your workload?

Practice Questions 5 questions