Questions: Simple Linear Regression: Theory and Estimation
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
After fitting a linear regression, you find R² = 0.94. When you plot the residuals against X, they form a clear U-shape — positive at low X, negative in the middle, positive again at high X. What does this indicate?
AThe model is excellent — R² near 1 confirms the fit is appropriate
BThe residuals are supposed to be U-shaped; this is expected behavior
CThe relationship between X and Y is likely nonlinear; the linear model is misspecified
DThe residuals indicate outliers that should be removed before re-fitting
Patterned residuals are a diagnostic signal that the model is misspecified, regardless of how high R² is. A U-shape means the linear model systematically over-predicts in one range and under-predicts in another — a telltale sign that the true relationship is curved, not linear. R² only measures how much variance is explained by the current model; a high R² with patterned residuals means the model captures a strong relationship, but not the right shape. Option A represents the most common mistake: trusting R² without inspecting residuals.
Question 2 Multiple Choice
Two datasets both have correlation r = 0.7 between X and Y. Dataset A has sX = 2 and sY = 6. Dataset B has sX = 4 and sY = 3. Which correctly describes their OLS regression slopes?
ABoth slopes equal 0.7, because the slope equals the correlation for OLS
BBoth slopes are equal, because equal correlations always imply equal slopes
CDataset A has slope 2.1 and Dataset B has slope 0.525
DThe slopes cannot be determined from correlation and standard deviations alone
The slope formula is β₁ = r(sY/sX). For Dataset A: 0.7 × (6/2) = 2.1. For Dataset B: 0.7 × (3/4) = 0.525. Two datasets can have identical correlations but very different slopes because the slope rescales correlation into actual measurement units — it tells you how many units Y changes per unit of X, while r is dimensionless. Options A and B reflect the misconception that correlation and slope are the same thing.
Question 3 True / False
In simple linear regression, R² equals the square of the Pearson correlation coefficient r between X and Y.
TTrue
FFalse
Answer: True
This is a key algebraic identity in simple (one predictor) linear regression: R² = r². It means all the intuition built around correlation transfers directly to R². A correlation of r = 0.8 means R² = 0.64 — the model explains 64% of Y's variance. This equivalence holds for simple regression but does NOT extend to multiple regression, where R² no longer equals any single correlation.
Question 4 True / False
A regression model with high R² and patterned residuals is well-specified — the patterned residuals are an artifact of the estimation procedure and should not affect interpretation.
TTrue
FFalse
Answer: False
Patterned residuals are one of the clearest diagnostic signals that a model is wrong, regardless of R². If residuals show a systematic curve, fan shape, or clustering, the model is not capturing the true relationship. This means predictions will be biased in predictable ranges, confidence intervals will be invalid, and conclusions about the slope will be unreliable. R² measures variance explained, not model correctness — a model can explain a lot of variance while being fundamentally misspecified.
Question 5 Short Answer
Why should you always inspect residual plots after fitting a regression, even when R² is very high?
Think about your answer, then reveal below.
Model answer: R² only measures the proportion of variance in Y explained by the model — it does not tell you whether the model's functional form is correct. Patterned residuals reveal violations of the model's assumptions: a curved pattern suggests the true relationship is nonlinear; a fan shape (increasing spread) indicates heteroscedasticity; clustered residuals may suggest omitted variables. A high R² confirms that X and Y are strongly related, but patterned residuals show that the linear model is not capturing that relationship correctly. The regression line may still be useful for interpolation near the mean of X, but predictions will be biased elsewhere.
Inspecting residuals is part of model validation, not optional post-hoc analysis. The OLS estimator always finds the best-fitting line — but 'best-fitting line' is not the same as 'correct model.' Only residual diagnostics can reveal whether the line is the right shape for the data.