CPU Performance Metrics and Amdahl's Law

College Depth 64 in the knowledge graph I know this Set as goal
Unlocks 3 downstream topics
performance CPI MIPS Amdahls-law benchmarking

Core Idea

CPU performance is quantified by the equation: CPU time = Instruction count × CPI × Clock cycle time. Improving any one factor improves performance, but trade-offs exist — reducing CPI with more pipeline stages may require a faster clock with shorter cycle time. Amdahl's Law states that the speedup from improving a fraction f of execution is bounded by 1/(1−f), meaning serial portions of code form a hard ceiling on total speedup. Benchmarks like SPEC measure real-world application performance rather than synthetic peak numbers, providing honest cross-architecture comparisons.

How It's Best Learned

Compute execution time using the CPU time equation under different clock speeds and CPI assumptions. Apply Amdahl's Law to predict speedup from parallelizing 80% of a workload. Compare SPEC scores across processor generations to observe the effect of architectural improvements.

Common Misconceptions

Explainer

From your work with pipelining and algorithm complexity, you already have intuitions about what makes computation fast or slow. Performance and benchmarking formalizes these intuitions into a precise framework. The central equation is deceptively simple: CPU time = Instruction count × CPI × Clock cycle time. Every architectural decision maps onto one or more of these three factors. A better compiler reduces instruction count. A wider pipeline or out-of-order execution reduces CPI (cycles per instruction). A smaller transistor process enables a faster clock, reducing cycle time. The equation makes explicit that improving one factor while worsening another can result in no net gain — or even a regression.

CPI deserves special attention because it is not a fixed number — it is an average across all instructions in a program. A load instruction that hits in L1 cache might take 1 cycle, while the same load missing to main memory might stall for 200 cycles. A program dominated by cache-friendly arithmetic will have a CPI near 1 (or below 1 on a superscalar processor), while a pointer-chasing traversal of a linked list might have an effective CPI of 50 or more. This is why clock speed alone is misleading: a 4 GHz processor with CPI of 5 is slower than a 2 GHz processor with CPI of 1 on the same instruction count.

Amdahl's Law provides a ceiling on optimization. If you can speed up 90% of a program's execution by a factor of 10, the overall speedup is not 10× — it is only about 5.3×, because the remaining 10% is unchanged and now dominates execution time. The formula is Speedup = 1 / ((1 − f) + f/S), where f is the fraction improved and S is the speedup of that fraction. The practical lesson is harsh: optimizing what is already fast is nearly worthless. Profiling to find the actual bottleneck is always the first step, because Amdahl's Law guarantees that effort spent on non-bottleneck code yields diminishing returns.

Benchmarks exist because no single metric captures performance across all workloads. MIPS (millions of instructions per second) is seductive but flawed — it varies by program, ignores instruction complexity, and can even increase when a compiler generates more but simpler instructions. The SPEC benchmark suites (SPECint, SPECfp) address this by running standardized real-world programs and reporting geometric means of speedup ratios relative to a reference machine. When you see a processor advertised as "20% faster," the right question is always: faster on what benchmark, and does that benchmark resemble your workload?

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsOperators and ExpressionsArithmetic Operators and Operator PrecedenceComparison Operators and Boolean TestsLogical Operators and Boolean AlgebraBoolean Algebra and Fundamental LawsCombinational Circuit DesignFlip-Flops and LatchesBinary Counters: Design and AnalysisBinary ArithmeticFixed-Point Number RepresentationTwo's Complement RepresentationOverflow and Underflow DetectionBinary Adders: Half-Adders and Full-AddersFull Adder and Carry PropagationCarry Lookahead Adder DesignHalf Adder Circuit DesignMultiplication Circuit DesignSequential Circuit DesignRegisters and Register FilesInstruction Set Architecture (ISA)Assembly Language BasicsCPU DatapathCPU Control UnitCPU PipeliningCPU Performance Metrics and Amdahl's Law

Longest path: 65 steps · 349 total prerequisite topics

Prerequisites (2)

Leads To (2)