Questions: Information Theory and Statistical Inference

4 questions to test your understanding

Score: 0 / 4
Question 1 Multiple Choice

The Kullback-Leibler divergence D_KL(p||q) = sum_x p(x) log(p(x)/q(x)) measures information lost when using q to approximate p. Why does maximum likelihood estimation (MLE) asymptotically minimize D_KL(p||q_theta)?

AMLE minimizes the likelihood, which is the reciprocal of KL divergence
BMLE maximizes log p_theta(data), and by the law of large numbers, this is equivalent to minimizing D_KL(empirical distribution || p_theta), which bounds D_KL(true p || p_theta)
CMLE is defined to minimize KL divergence by design
DKL divergence and likelihood are not related
Question 2 True / False

The Cramer-Rao bound states that the variance of any unbiased estimator of theta is lower-bounded by 1/F(theta), where F is Fisher information. This bound is information-theoretic: it relates curvature of the likelihood landscape to estimation precision.

TTrue
FFalse
Question 3 Short Answer

Explain how the error exponent in binary hypothesis testing (Neyman-Pearson setting) is related to the Kullback-Leibler divergence between the two hypotheses.

Think about your answer, then reveal below.
Question 4 Multiple Choice

The Akaike Information Criterion (AIC) = -2*log-likelihood + 2*k penalizes model complexity by 2k. In what sense is AIC an 'information criterion'?

AAIC measures information content of the model parameters
BAIC approximates the KL divergence between the true distribution and the fitted model, plus a penalty for overfitting. It balances likelihood (KL divergence) and complexity, derived from information theory
CAIC is based on Shannon entropy directly
DAIC has no connection to information theory