A compiler refuses to vectorize the loop: `for (i=1; i<n; i++) A[i] = A[i-1] * 2;`. The most accurate explanation is:
AThe loop body is too complex for the SIMD vectorization pass to analyze
BThere is a loop-carried dependency: iteration i reads the value written by iteration i-1, so parallel execution would produce wrong results
CThe array is too small — vectorization is only beneficial for large arrays
DThe multiplication operation is not supported on this CPU's SIMD unit
This loop has a loop-carried dependency: A[i] depends on A[i-1], which was written by the previous iteration. If you executed iterations 1 and 2 simultaneously, iteration 2 might read the original (pre-loop) value of A[1] instead of the value just written by iteration 1, producing wrong results. The compiler's dependence analysis detects this and rightly refuses vectorization — correctness always overrides performance. A simple independent loop like `A[i] = B[i] * 2` (no cross-iteration dependencies) would vectorize freely.
Question 2 Multiple Choice
A loop processes 1,007 elements using AVX (256-bit registers, 8 floats per register). How does the compiler handle the elements that don't fit evenly into the SIMD width?
AIt rounds down to 1,000 elements and skips the last 7 to keep the loop simple
BIt pads the array allocation to 1,008 elements so the count is a multiple of 8
CIt generates a scalar remainder loop that processes the last 7 elements after 125 full vector iterations
DIt refuses to vectorize because the element count must be a compile-time constant divisible by 8
The compiler generates a vectorized main loop covering ⌊1007/8⌋ = 125 iterations (processing elements 0–999), then a scalar remainder loop for the last 7 elements (1000–1006). This is a standard compiler strategy — the remainder loop is a simple scalar fallback, not a failure. Some compilers can also generate 'peeled' prologue iterations to align the main loop on memory boundaries before the vectorized portion. The compiler never silently skips elements or refuses on this basis.
Question 3 True / False
Even when a loop has a loop-carried dependency (such as summing all elements of an array), a compiler may still be able to vectorize it by using multiple partial accumulators in separate vector lanes.
TTrue
FFalse
Answer: True
A reduction like `sum += A[i]` has a loop-carried dependency on `sum`, but the compiler can break it by using multiple independent partial sums — say, 8 separate accumulators in one AVX register, each accumulating every 8th element. After the vectorized loop, a horizontal add combines the 8 partial sums into the final result. This transforms a loop-carried dependency on a scalar into a dependency only on the final reduction step, which can be done once outside the loop. The compiler must prove associativity (floating-point reductions require `-ffast-math` or equivalent).
Question 4 True / False
When a compiler can seldom prove that two pointer arguments do not alias (point to overlapping memory), it will generally refuse to vectorize any loop involving those pointers.
TTrue
FFalse
Answer: False
Rather than refusing outright, the compiler can generate a runtime alias check: it emits code that compares the pointer ranges at runtime and branches to either the vectorized or scalar version depending on whether they overlap. This produces a function that is correct in all cases while still achieving speedup in the common non-aliasing case. The programmer can also help by annotating pointers with `restrict` (C99), explicitly asserting no aliasing and allowing the compiler to skip the runtime check and always use the vectorized path.
Question 5 Short Answer
Why is proving absence of loop-carried dependencies the critical prerequisite for vectorization, and what can programmers do to help the compiler vectorize loops it would otherwise reject?
Think about your answer, then reveal below.
Model answer: Vectorization executes multiple loop iterations simultaneously, so if iteration i writes a value that iteration i+k reads, the parallel execution produces wrong results — the read may see a stale or partially-updated value. Correctness is non-negotiable, so the compiler only vectorizes when it can prove no such cross-iteration dependencies exist. Programmers can help by: (1) using `restrict` on pointer parameters to assert non-aliasing; (2) avoiding writes to arrays that are also read with different indices in the same loop; (3) separating computations into independent loops the compiler can analyze more easily; (4) using `#pragma GCC ivdep` or similar to manually assert to the compiler that no dependencies exist when the programmer knows this to be true; and (5) using compiler reports (-fopt-info-vec on GCC) to understand why specific loops aren't vectorizing.
The key insight is that a loop fails to vectorize because the compiler couldn't prove safety, not necessarily because vectorization is impossible. Programmers who understand dependence analysis can provide the information the compiler lacks — either through annotations like `restrict`, through code restructuring, or through manual SIMD intrinsics as a last resort.