Questions — R-Squared: Goodness of Fit

Question 1 Multiple Choice

A researcher adds 15 additional control variables to a regression, and R² rises from 0.41 to 0.68. A colleague says this proves the new model is better. What is wrong with this reasoning?

ANothing — higher R² always indicates a better model, since more variation is explained

BR² mechanically rises when variables are added, even irrelevant ones; in-sample fit improvement says nothing about out-of-sample prediction or causal validity

CThe colleague should have used adjusted R² only if the added variables were categorical

DR² above 0.5 is a sign of overfitting, so the original model was preferable

Question 2 Multiple Choice

A randomized controlled trial estimates the effect of a job training program on earnings with R² = 0.04. An observational study of the same program achieves R² = 0.71 by including many demographic controls. Which estimate is more causally trustworthy?

AThe observational study — its R² is far higher, meaning the model fits the data much better

BThe RCT — randomization eliminates confounding, making the treatment effect estimate unbiased regardless of R²

CThey are equivalent — both report regression estimates, so the causal validity is the same

DThe observational study — more controls always reduce omitted variable bias

Question 3 True / False

Adding a variable to an OLS regression model can never decrease R².

TTrue

FFalse

Question 4 True / False

A regression model with R² = 0.90 produces coefficient estimates that are more likely to be unbiased than a model with R² = 0.30.

TTrue

FFalse

Question 5 Short Answer

Why is R² an inadequate criterion for evaluating the causal validity of a regression model, and what should researchers care about instead?

Think about your answer, then reveal below.

Questions: R-Squared: Goodness of Fit