Questions — Information Theory and Statistical Inference

Question 1 Multiple Choice

The Kullback-Leibler divergence D_KL(p||q) = sum_x p(x) log(p(x)/q(x)) measures information lost when using q to approximate p. Why does maximum likelihood estimation (MLE) asymptotically minimize D_KL(p||q_theta)?

AMLE minimizes the likelihood, which is the reciprocal of KL divergence

BMLE maximizes log p_theta(data), and by the law of large numbers, this is equivalent to minimizing D_KL(empirical distribution || p_theta), which bounds D_KL(true p || p_theta)

CMLE is defined to minimize KL divergence by design

DKL divergence and likelihood are not related

Question 2 True / False

The Cramer-Rao bound states that the variance of any unbiased estimator of theta is lower-bounded by 1/F(theta), where F is Fisher information. This bound is information-theoretic: it relates curvature of the likelihood landscape to estimation precision.

TTrue

FFalse

Question 3 Short Answer

Explain how the error exponent in binary hypothesis testing (Neyman-Pearson setting) is related to the Kullback-Leibler divergence between the two hypotheses.

Think about your answer, then reveal below.

Question 4 Multiple Choice

The Akaike Information Criterion (AIC) = -2*log-likelihood + 2*k penalizes model complexity by 2k. In what sense is AIC an 'information criterion'?

AAIC measures information content of the model parameters

BAIC approximates the KL divergence between the true distribution and the fitted model, plus a penalty for overfitting. It balances likelihood (KL divergence) and complexity, derived from information theory

CAIC is based on Shannon entropy directly

DAIC has no connection to information theory

Questions: Information Theory and Statistical Inference