Questions: Asymptotic Normality of Regression Estimators
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A researcher runs OLS on a dataset where the error terms are clearly right-skewed (not normal). With n = 800 observations, can they still conduct valid t-tests on the coefficients?
ANo — t-tests require normally distributed errors, so the results are invalid
BYes — the CLT ensures the OLS estimator is approximately normally distributed in large samples regardless of error distribution
CYes — but only if heteroskedasticity-robust standard errors are used
DNo — asymptotic normality only applies when errors have zero skewness
This is the core insight: asymptotic normality does NOT require normally distributed errors. The OLS estimator can be written as a weighted sum of the y_i values, and by the CLT, this sum converges to a normal distribution as n grows — regardless of the error distribution — provided errors have finite variance. With n = 800, the large-sample approximation is reliable. Option A reflects the common misconception that normal errors are a prerequisite for standard inference.
Question 2 Multiple Choice
The statement '√n(β̂ − β) converges in distribution to N(0, V)' means which of the following?
Aβ̂ equals the true β plus normally distributed noise in every sample
Bβ̂ is exactly normally distributed around β for any sample size n
CThe standardized deviation of β̂ from β follows approximately a normal distribution in large samples
Dβ̂ is unbiased in large samples, meaning E[β̂] = β
Asymptotic normality is a large-sample approximation about the *standardized* estimator, not an exact finite-sample result. The √n scaling is needed to produce a non-degenerate limit — without it, β̂ − β collapses to zero (consistency). The claim is that the shape of the sampling distribution approaches normal as n grows, enabling approximate inference via t- and F-statistics. Options A and B confuse asymptotic with exact results; option D describes consistency, not normality.
Question 3 True / False
Asymptotic normality of OLS requires that the error terms ε_i are normally distributed.
TTrue
FFalse
Answer: False
This is the central misconception the topic corrects. Asymptotic normality holds because the OLS estimator is a sum of terms x_i·ε_i, and the CLT applies to sums of independent random variables with finite variance — regardless of the underlying distribution. The normal-errors assumption matters for *exact* finite-sample inference (e.g., exact t-distributions with small n), but in large samples, the CLT takes over and normality of errors is not needed.
Question 4 True / False
In small samples, the assumption of normally distributed errors is more important for justifying OLS inference than it is in large samples.
TTrue
FFalse
Answer: True
In large samples, asymptotic normality via the CLT validates t- and F-statistics without requiring normal errors. In small samples, the CLT approximation may be poor, and the exact finite-sample justification for t-tests relies on the error terms actually being normal (so that β̂ is exactly normal, and t-statistics follow exact t-distributions). As n grows, the distinction vanishes because the CLT approximation improves.
Question 5 Short Answer
Why does the OLS estimator become approximately normally distributed in large samples even when the error term is not normally distributed?
Think about your answer, then reveal below.
Model answer: The OLS estimator can be written as β̂ = β + (X'X)⁻¹X'ε, where the key term X'ε is a sum of n terms x_i·ε_i. By the central limit theorem, the standardized sum of many independent random variables with finite variance converges in distribution to a normal — regardless of the individual distribution of each term. It is the averaging and summing process, not the shape of ε_i, that produces the asymptotic normality.
The insight is that normality emerges from the mathematical structure of the estimator (a weighted sum), not from any property of the underlying data-generating process. This is why asymptotic inference is so powerful: it requires only weak conditions (finite variance, well-behaved regressors) rather than strong distributional assumptions.