Questions: Information Criteria: AIC and BIC for Model Selection
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
You compare three models on the same dataset. Model A: AIC=450, BIC=480. Model B: AIC=440, BIC=510. Model C: AIC=460, BIC=465. What does the disagreement between AIC and BIC for Model B suggest?
AModel B has a computational error — AIC and BIC should always agree
BModel B fits the data best in absolute terms since it has the lowest AIC
CModel B likely has more parameters; BIC penalizes them more harshly, so AIC favors it for prediction while BIC prefers a simpler alternative
DModel B should be rejected outright because the two criteria disagree
Disagreement between AIC and BIC is common when a model adds parameters that improve fit substantially. AIC penalizes each parameter by 2; BIC penalizes by ln(n), which exceeds 2 for any n > 8. Model B's lowest AIC means it best balances fit and complexity for prediction purposes, but BIC's harsher penalty discourages its extra parameters. The right choice depends on whether you prioritize predictive accuracy (AIC) or identifying the true model structure (BIC).
Question 2 Multiple Choice
A researcher fits a model predicting Y (AIC = 300) and a model predicting log(Y) (AIC = 250), and concludes the log-linear model is better. What is wrong with this reasoning?
ANothing — lower AIC always indicates a better model regardless of the response variable
BAIC cannot be used for log-linear models, only for ordinary linear regression
CAIC values are only comparable when models use the same response variable on the same dataset; the two likelihoods live on different scales
DThe researcher should use BIC instead, which corrects for response variable transformations
This is a critical caveat of information criteria: AIC and BIC can only be compared across models with the same response variable on the same dataset. When you predict log(Y) instead of Y, the likelihood function changes fundamentally — it is computed in the space of log(Y). The AIC=250 and AIC=300 are not on the same scale and cannot be meaningfully compared. To compare log and non-log specifications, a different approach is needed (e.g., cross-validation on the original scale).
Question 3 True / False
A model that achieves the lowest AIC in a comparison set can be considered a well-fitting model in an absolute sense.
TTrue
FFalse
Answer: False
False. AIC and BIC are relative comparison tools — a lower AIC means a model is better than its competitors in the candidate set, but says nothing about absolute fit. All models in the comparison could be terrible, and the 'winner' by AIC is merely the least bad. This is why information criteria must always be paired with residual diagnostics and substantive scrutiny: winning a comparison does not certify a model's adequacy.
Question 4 True / False
For any sample size larger than approximately 8 observations, BIC imposes a stricter penalty per additional parameter than AIC does.
TTrue
FFalse
Answer: True
True. AIC penalizes each parameter by 2. BIC penalizes each parameter by ln(n). Since ln(8) ≈ 2.08, for n > 8 we have ln(n) > 2, so BIC's per-parameter penalty exceeds AIC's. For large samples (e.g., n = 1000, ln(n) ≈ 6.9), BIC is substantially harsher. This is why BIC consistently selects simpler models than AIC when sample sizes are moderate to large.
Question 5 Short Answer
What is the fundamental difference in theoretical motivation between AIC and BIC, and when would you prefer each?
Think about your answer, then reveal below.
Model answer: AIC is motivated by minimizing predictive error: it selects the model that best predicts new data from the same generating process. BIC is motivated by identifying the true model: it is consistent, meaning it selects the true model with probability 1 as n → ∞ if the true model is among the candidates. Prefer AIC when building a predictive tool; prefer BIC when testing theoretical structure and you believe the true model is in your candidate set.
These different motivations matter in practice. A researcher testing competing economic theories wants BIC: given enough data, BIC will identify the correct model structure. A forecasting practitioner wants AIC: it optimizes for out-of-sample prediction accuracy even when the 'true' model is never in the candidate set. Neither criterion is universally correct — the right choice depends on the scientific question being asked.