Questions: Introduction to Multiple Linear Regression
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
In a simple regression, number of books in the home positively predicts vocabulary scores (slope = 2.3). When family income is added to the model, the slope for books drops to 0.4. The most likely explanation is:
AThe data was entered incorrectly — adding income shouldn't change the books coefficient
BFamily income is a confounder: wealthier families both have more books and higher-vocabulary children, so the books slope was partly capturing income's effect
CMulticollinearity has made the books coefficient biased toward zero
DIncome should not have been added since it isn't directly related to vocabulary
This is statistical control in action. In simple regression, books' slope absorbed the shared variance with income (books and income are correlated; both predict vocabulary). Adding income lets the model partial out income's contribution, isolating books' unique effect. The original slope wasn't wrong — it was the correct marginal slope for books without controlling for income. Option C is incorrect: multicollinearity would inflate standard errors and create instability, not systematically shrink a coefficient.
Question 2 Multiple Choice
A multiple regression model achieves R² = 0.92 with 15 predictors on only 20 observations. A statistician flags this as problematic. Why?
AR² above 0.9 always indicates multicollinearity
BWith 20 observations and 15 predictors, the model is almost certainly overfitting — fitting noise in the data rather than real signal
C15 predictors is simply too many to interpret, regardless of sample size
DHigh R² in multiple regression always signals a spurious causal relationship
With only 20 observations and 15 predictors, the model has almost no degrees of freedom remaining. Adding predictors always increases R² on training data, even if the predictors are pure noise — with 15 predictors and 20 observations, you're essentially memorizing the dataset. The rule of thumb is roughly 10–20 observations per predictor for stable estimates. High R² is not inherently problematic; high R² with very few observations per predictor is.
Question 3 True / False
A predictor with a non-significant p-value in multiple regression has no real relationship with the outcome variable.
TTrue
FFalse
Answer: False
Non-significance can reflect multicollinearity rather than a true lack of relationship. When two predictors are strongly correlated, the model cannot distinguish their contributions — both get large standard errors and high p-values even if each has a genuine association with Y. A predictor can be genuinely important while appearing non-significant because it shares variance with another predictor in the model. Non-significance means 'we can't isolate this predictor's effect,' not 'this predictor doesn't matter.'
Question 4 True / False
The partial slope β₁ in multiple regression tells you the expected change in Y for a one-unit increase in X₁, holding all other predictors in the model constant.
TTrue
FFalse
Answer: True
This is the definition of a partial slope and the conceptual core of multiple regression. The 'holding everything else constant' interpretation is what makes multiple regression useful for observational data: by including potential confounders, you partial out their contributions and estimate each predictor's unique association with the outcome. This is statistical control — the model does algebraically what a controlled experiment does physically.
Question 5 Short Answer
Why can a predictor's slope in multiple regression differ substantially from its slope in a simple regression with only that predictor?
Think about your answer, then reveal below.
Model answer: In simple regression, the slope for X₁ captures its total association with Y, including shared variance with any omitted predictors. In multiple regression, each partial slope represents X₁'s unique association with Y after statistically partialing out all other predictors in the model. If X₁ is correlated with another predictor X₂, and X₂ also predicts Y, simple regression conflates both effects in X₁'s slope. Multiple regression separates them, revealing X₁'s independent contribution.
This difference between marginal and partial slopes is the key conceptual advance of multiple regression. The slope change when adding a predictor is not a flaw — it reveals how much of the original association was due to confounding. A large drop in a coefficient after adding predictors is evidence that the original association was partly spurious. This is why multiple regression is so important for disentangling correlated predictors in observational research.