Suppose Xₙ converges in distribution to X, where X ~ N(0,1). Which of the following can we conclude?
AFor large n, P(|Xₙ − X| > 0.01) is small — the values of Xₙ are close to the values of X
BXₙ and X must be defined on the same probability space for the limit to make sense
CThe CDF of Xₙ approaches the standard normal CDF at all continuity points, but Xₙ and X may not be close as numbers
DXₙ converges in probability to X, since distributional convergence implies probabilistic closeness
Convergence in distribution only requires that the CDFs Fₙ(x) → F(x) at continuity points. It says nothing about how close Xₙ and X are as random variable values. Xₙ and X need not even be defined on the same probability space — only their marginal distributions matter. Option A describes convergence in probability, which is strictly stronger and does not follow from distributional convergence. Option D is backwards: convergence in probability implies convergence in distribution, not the reverse. The classic illustration is that if Xₙ ~ N(0,1) for all n, and X ~ N(0,1) independently, then Xₙ →d X, but |Xₙ − X| is not small.
Question 2 Multiple Choice
A statistics student claims: 'By the Central Limit Theorem, for large n, the sample mean X̄ₙ is approximately normally distributed.' What is the precise flaw in this statement?
AThere is no flaw — the CLT says the sample mean is approximately normal for large n
BThe sample mean converges in probability to μ, not to a normal distribution
CThe CLT applies to the standardized sum √n(X̄ₙ − μ)/σ, not to X̄ₙ itself, which converges to the constant μ
DThe CLT only applies when the population distribution is already normal
The CLT says that √n(X̄ₙ − μ)/σ converges in distribution to N(0,1) — the properly centered and scaled version of the sample mean. X̄ₙ itself converges in probability to the constant μ (by the law of large numbers), so its distribution collapses to a point mass, not a normal curve. The student's phrasing is common shorthand and is technically imprecise: what is approximately normal is the standardized sample mean, not X̄ₙ itself. Convergence in distribution is the exactly right tool here — it describes how the shape of the fluctuations, once scaled to be of order 1, approaches the standard normal.
Question 3 True / False
If Xₙ and X need not be defined on the same probability space, then convergence in distribution is a weaker notion than convergence in probability.
TTrue
FFalse
Answer: True
True. Convergence in probability requires both P(|Xₙ − X| > ε) → 0 for all ε > 0, which presupposes Xₙ and X are defined on a common probability space so that Xₙ − X is a well-defined random variable. Convergence in distribution requires only that the CDFs converge pointwise (at continuity points), a purely marginal condition that makes no reference to joint behavior. Because convergence in probability implies convergence in distribution but not conversely, distributional convergence is strictly weaker: a sequence can converge in distribution to X without the actual values of Xₙ being close to X at all.
Question 4 True / False
If Xₙ converges in distribution to X, then Xₙ must also converge in distribution to any random variable Y that has the same distribution as X.
TTrue
FFalse
Answer: True
True, and this highlights the distributional nature of this convergence mode. Convergence in distribution is entirely about the limiting CDF, not about which specific random variable X is. If Y has the same distribution as X (i.e., F_Y = F_X), then Fₙ(x) → F_X(x) = F_Y(x) at all continuity points, so Xₙ →d Y as well. This is consistent with the fact that Xₙ and X need not be on the same probability space — 'X' in 'Xₙ →d X' is just a convenient name for the limiting distribution, not a specific random variable with a specific sample path.
Question 5 Short Answer
The definition of convergence in distribution requires CDF convergence only at continuity points of the limiting distribution F, not at all points. Explain why the continuity-point caveat is necessary.
Think about your answer, then reveal below.
Model answer: CDFs can have jump discontinuities at points where the limiting distribution places positive probability mass. At a jump point x₀, F(x₀⁻) < F(x₀), and a sequence of CDFs Fₙ could converge to either boundary value as n → ∞, depending on the sequence. If we required Fₙ(x₀) → F(x₀) at jump points, we could inadvertently exclude sequences whose distributions are genuinely approaching the limiting distribution but whose CDFs happen to converge to the left-hand limit at the jump. At continuity points, no such ambiguity exists: F is continuous there, so F(x⁻) = F(x), and the convergence condition is unambiguous.
The characteristic function criterion (φₙ(t) → φ(t) for all t) avoids this issue entirely because characteristic functions are always continuous — there are no jump discontinuities to worry about. This is one practical reason the characteristic function approach is preferred in theoretical work.