Questions: Minimum Description Length

4 questions to test your understanding

Score: 0 / 4
Question 1 Multiple Choice

In MDL, the total description length is L(M) + L(D|M). Why is it insufficient to minimize only L(D|M) (the fit) when selecting a model?

ABecause L(D|M) is always zero for complex models
BBecause minimizing only L(D|M) leads to overfitting — every training set can be perfectly memorized by a sufficiently complex model (e.g., a lookup table), which explains the data with L(D|M) = 0 but does not generalize. MDL penalizes complexity through L(M), forcing a tradeoff.
CBecause L(D|M) cannot be computed
DBecause the data length is fixed and does not vary across models
Question 2 True / False

MDL is asymptotically equivalent to the Bayesian information criterion (BIC), which penalizes model complexity using n*log(k) where n is the sample size and k is the model's degrees of freedom.

TTrue
FFalse
Question 3 Short Answer

Explain how MDL connects to Kolmogorov complexity and Solomonoff induction. Why does MDL approximate the theoretical optimum for learning?

Think about your answer, then reveal below.
Question 4 Multiple Choice

A dataset D of 100 samples is fit with two models: (A) a 2-parameter linear regression with total description L_A = 100 bits, (B) a 10-parameter polynomial regression with total description L_B = 150 bits. Which model does MDL prefer, and why?

AModel B, because it has more parameters and higher capacity
BModel A, because it has lower total description length (100 < 150)
CModel B, because the sample size is 100 and polynomial has better expressiveness
DThe choice depends on the likelihood values, not the description length