A statistician fits an MLE to data from a Uniform(0, θ) distribution and uses √n(θ̂_n − θ) → N(0, 1/I(θ)) to construct a confidence interval. Why is this reasoning invalid?
AThe MLE for Uniform(0, θ) is biased, and asymptotic normality applies only to unbiased estimators
BThe Uniform(0, θ) model violates the regularity conditions for asymptotic normality — the support depends on θ, the log-likelihood is not differentiable at the boundary, and the MLE converges at rate n (not √n) to a non-normal limit
CThe Cramér-Rao bound is undefined for uniform distributions, making the variance formula 1/I(θ) inapplicable
DThe Fisher information for Uniform(0, θ) is infinite, which makes the asymptotic variance undefined
The standard asymptotic normality theorem requires regularity conditions including: support of the distribution must not depend on the parameter, and the log-likelihood must be twice differentiable. For Uniform(0, θ), the support [0, θ] changes with θ — the boundary of the sample space moves. The MLE is the maximum order statistic X_(n), which converges at rate n (not √n) to an exponential distribution. This is a classic exception illustrating why the regularity conditions are not merely technical fine print — their failure changes both the rate and the limit distribution.
Question 2 Multiple Choice
The statement that the MLE is 'asymptotically efficient' means:
ANo estimator can have lower variance than the MLE for any sample size, including small samples
BThe MLE achieves exactly the Cramér-Rao lower bound in all finite samples under the model assumptions
CAmong all consistent, asymptotically normal estimators, the MLE achieves the smallest possible asymptotic variance — equal to 1/(nI(θ)) — in the limit as n → ∞
DThe MLE converges to the true parameter faster than any other estimator for all distributions
Asymptotic efficiency is a large-sample statement. The Cramér-Rao bound 1/(nI(θ)) is the minimum variance for unbiased estimators; the MLE saturates this bound in the limit. However, in finite samples, the MLE can be outperformed by other estimators, may be biased, and may not attain the bound. Options A and B are incorrect because they assert finite-sample properties that the asymptotic result does not guarantee. The qualifier 'asymptotically' carries the entire weight of the claim.
Question 3 True / False
The asymptotic normality of the MLE follows, via a Taylor expansion of the score function, from applying the central limit theorem to individual score contributions ℓ'(θ; Xᵢ), which have mean zero and variance I(θ) under regularity conditions.
TTrue
FFalse
Answer: True
The proof structure is exactly this: the score function S_n(θ) = Σ ℓ'(θ; Xᵢ) is a sum of i.i.d. terms with E[ℓ'(θ;X)] = 0 and Var[ℓ'(θ;X)] = I(θ). By the CLT, (1/√n)S_n(θ) → N(0, I(θ)). A Taylor expansion around the MLE, combined with the WLLN applied to the second derivative (which converges to −I(θ)), yields √n(θ̂_n − θ) → N(0, 1/I(θ)). The CLT is applied to the score function — not the log-likelihood directly — and this is the key structural step.
Question 4 True / False
If the MLE is asymptotically normal with variance 1/(nI(θ)), then it achieves the Cramér-Rao lower bound in finite samples.
TTrue
FFalse
Answer: False
Asymptotic normality and efficiency are limit statements — they describe behavior as n → ∞, not at any fixed sample size. In finite samples, the MLE may be biased, may not attain the Cramér-Rao bound, and may be outperformed by other estimators. The practical significance of asymptotic efficiency is that for sufficiently large n, no consistent asymptotically normal estimator can systematically beat the MLE in variance. The word 'asymptotically' is not decorative — it is the entire content of the claim.
Question 5 Short Answer
What is the significance of Fisher information I(θ) appearing as the asymptotic variance in √n(θ̂_n − θ) → N(0, 1/I(θ)), and what does this tell us about the MLE relative to other estimators?
Think about your answer, then reveal below.
Model answer: Fisher information quantifies how much information each observation carries about θ; the Cramér-Rao bound states that no unbiased estimator can have variance below 1/(nI(θ)). The fact that the MLE's asymptotic variance is exactly 1/I(θ) means the MLE saturates this bound in the limit — it extracts the maximum possible statistical information from the data. Among all consistent, asymptotically normal estimators, no other estimator can achieve a smaller asymptotic variance. This makes the MLE asymptotically optimal in a precise sense.
The Fisher information appears naturally from the proof: the variance of the score function is I(θ), and the WLLN tells us the second derivative of the log-likelihood converges to −I(θ). The ratio of these two quantities gives 1/I(θ) as the asymptotic variance — a consequence of the mathematical structure of maximum likelihood, not a coincidence. The Cramér-Rao bound and the MLE's limit variance are the same quantity because the score function is both the ingredient of the CLT in the proof and the definition of Fisher information.