Array Subscript Optimization

Graduate Depth 78 in the knowledge graph I know this Set as goal
Unlocks 1 downstream topic
optimization loops memory

Core Idea

Array subscript expressions often involve expensive multiplication and addition operations in loops. Strength reduction optimizes subscripts by detecting linear patterns (common in loops) and substituting cheaper operations. This optimization is particularly important for dense linear algebra code.

How It's Best Learned

Implement strength reduction for induction variables in loops. Manually optimize nested loop array accesses.

Explainer

Consider a simple loop that processes each element of an array: `for (i = 0; i < n; i++) a[i] = 0;`. The expression `a[i]` looks innocent, but the compiler must translate it into an address calculation: `base_address + i * element_size`. That multiplication executes on every iteration, even though the address advances by a fixed stride each time. From your study of loop detection and data dependence analysis, you know how to identify loop structure and track how variables change across iterations. Array subscript optimization exploits that regularity to eliminate redundant address arithmetic.

The central technique is strength reduction applied to induction variables. An induction variable is one that changes by a constant amount on each loop iteration — the classic loop counter `i` is the simplest example. The array address `base + i * element_size` is a derived induction variable: it is a linear function of `i`. Strength reduction replaces the multiplication with an addition by introducing a new pointer variable initialized to `base` before the loop, then incremented by `element_size` on each iteration. The expensive `multiply + add` per iteration becomes a single `add`. On most hardware, addition is significantly cheaper than multiplication, and the savings compound across millions of iterations in tight loops.

For nested loops, the optimization becomes more powerful and more intricate. Consider `a[i][j]` in a doubly-nested loop. The unoptimized address is `base + i * row_size + j * element_size` — two multiplications per inner iteration. The compiler can reduce the outer multiplication by maintaining a row pointer that advances by `row_size` in the outer loop, and reduce the inner multiplication by incrementing a column pointer by `element_size` in the inner loop. Data dependence analysis confirms these transformations are safe: the new pointer-based access pattern reaches exactly the same memory locations in the same order, so no dependencies are violated.

The compiler must also handle cases where subscript expressions are more complex — `a[2*i + 1]` or `a[i*n + j]` — by recognizing the linear pattern and reducing it to an initial value plus a constant stride. When the subscript is not a linear function of loop variables (e.g., `a[b[i]]` with indirect indexing), strength reduction does not apply. This optimization is particularly impactful in dense linear algebra — matrix multiplication, convolution, stencil computations — where the innermost loops are dominated by regular array access patterns and even small per-iteration savings translate to large absolute speedups.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesLiteral EquationsSlope-Intercept FormPoint-Slope FormWriting Linear EquationsParallel and Perpendicular Line SlopesGraphing Linear EquationsPiecewise FunctionsStep FunctionsComposition of FunctionsInverse FunctionsRadical Functions and GraphsRational ExponentsExponential Functions and GraphsLogarithms IntroductionTime and Space ComplexityAmortized AnalysisHash TablesSymbol Tables and Scope ResolutionSemantic Analysis PhaseIntermediate Code RepresentationControl Flow GraphsFixpoint Computation and IterationDataflow AnalysisReaching Definitions AnalysisCommon Subexpression Elimination (CSE)Dead Code EliminationCode Optimization FundamentalsVectorization and SIMD Code GenerationLoop Invariant Code Motion (LICM)Loop UnrollingLoop Detection and AnalysisArray Subscript Optimization

Longest path: 79 steps · 418 total prerequisite topics

Prerequisites (5)

Leads To (1)