A researcher uses two estimators for the same parameter θ. For large samples, the MLE achieves asymptotic variance I(θ)^{-1}/n. Estimator B achieves variance 2·I(θ)^{-1}/n. What does 'asymptotic efficiency' of the MLE mean in this context?
AThe MLE is faster to compute than estimator B
BThe MLE converges to θ at rate √n while estimator B converges at rate n
CNo unbiased estimator can achieve smaller asymptotic variance than the MLE, so estimator B is suboptimal
DThe MLE's asymptotic variance does not depend on the sample size
Asymptotic efficiency means the MLE achieves the Cramér-Rao lower bound asymptotically — the minimum variance any unbiased estimator can achieve for large n. Estimator B's variance of 2·I(θ)^{-1}/n is twice the minimum, making it asymptotically inefficient. This is a strong optimality result: in the large-sample limit, no unbiased estimator can improve on the MLE's variance.
Question 2 Multiple Choice
Under regularity conditions, √n(θ̂_n − θ) → N(0, I(θ)^{-1}). A statistician has n = 400, θ̂_n = 3.0, and I(θ̂_n) = 16. Which expression gives the approximate 95% confidence interval?
A3.0 ± 1.96 × 16
B3.0 ± 1.96 / √(400 × 16)
C3.0 ± 1.96 × 1/√16
D3.0 ± 1.96 / √400
The asymptotic standard error of θ̂_n is 1/√(n·I(θ̂_n)) = 1/√(400 × 16) = 1/80 = 0.0125. The 95% CI is θ̂_n ± 1.96/√(n·I(θ̂_n)) = 3.0 ± 0.0245. Option D ignores the Fisher information; option C ignores n; option A uses I directly as the standard deviation, which is wrong. Higher Fisher information makes the interval much narrower — reflecting that this data is highly informative about θ.
Question 3 True / False
The asymptotic normality theorem implies that θ̂_n is exactly normally distributed for any sample size n, provided the model is correctly specified.
TTrue
FFalse
Answer: False
Asymptotic normality is a large-sample (n → ∞) approximation, not an exact finite-sample result. For any fixed n, θ̂_n may have a distribution very different from normal — skewed, heavy-tailed, or bounded — depending on the model. The theorem says the distribution converges to normal as n → ∞. In practice, how large n must be for the approximation to be adequate depends on the model and is an empirical question.
Question 4 True / False
Higher Fisher information at the true parameter θ implies the MLE will be more precisely estimated asymptotically, since the asymptotic variance is I(θ)^{-1}/n.
TTrue
FFalse
Answer: True
Fisher information measures how sensitive the log-likelihood is to changes in θ — equivalently, how much information the data carries about θ. Higher I(θ) means more curvature of the log-likelihood around the true value, so the MLE is more tightly concentrated around θ. The asymptotic variance I(θ)^{-1}/n is inversely proportional to I(θ), so higher information → smaller variance → more precise estimates. This is why experiments are designed to maximize Fisher information.
Question 5 Short Answer
Why does the Fisher information appear in the asymptotic variance of the MLE? What would it mean for inference if a parameter had very low Fisher information?
Think about your answer, then reveal below.
Model answer: Fisher information I(θ) = E[(∂log f/∂θ)²] is the variance of the score function — it measures how sharply the log-likelihood peaks at the true value. In the asymptotic normality proof, the numerator (scaled score) converges to N(0, I(θ)) via the CLT, and the denominator (scaled observed information) converges to I(θ). The resulting ratio has variance I(θ)^{-1}. Low Fisher information means the log-likelihood is nearly flat near θ — many parameter values fit the data about equally well — so the MLE is imprecise, confidence intervals are wide, and hypothesis tests have low power.
The connection between information and precision is the deep insight here. Fisher information quantifies what the data can tell us about θ. A nearly flat likelihood surface (low I(θ)) means the data doesn't strongly favor any particular parameter value; a sharply peaked surface (high I(θ)) pins down θ precisely. This is why the asymptotic variance is I(θ)^{-1}/n — information and variance are reciprocals of each other.