Questions: Local Polynomial Regression and Bandwidth Selection
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A researcher doubles the bandwidth in a local linear regression. What is the most likely effect on the resulting estimates?
AVariance increases and bias decreases, because more data points are used for each estimate
BVariance decreases and bias increases, because the local polynomial must approximate the true function over a wider range
CBoth variance and bias decrease, because more data always improves estimation
DThe estimates are unaffected, because local polynomial regression automatically adjusts for bandwidth changes
This is the fundamental bias-variance tradeoff in local polynomial regression. A larger bandwidth borrows more observations, reducing variance (the estimate is less noisy). But it also forces the local polynomial to approximate the true function over a wider range — if the true function curves, the locally linear fit will be systematically off, introducing bias. Doubling bandwidth doesn't uniformly improve estimates; it trades one error source for another. Optimal bandwidth minimizes total mean squared error, balancing these forces.
Question 2 Multiple Choice
In a regression discontinuity design, why is local linear regression (degree 1) preferred over local constant regression (degree 0) near the cutoff?
ALocal linear uses more observations, reducing variance at the boundary
BLocal constant has worse boundary behavior because it cannot capture the slope of the true function, introducing upward bias at the edges of the support
CLocal linear automatically selects the optimal bandwidth, while local constant requires manual tuning
DLocal constant regression is biased everywhere, not just at boundaries
Local constant regression (fitting a local mean) suffers from boundary bias: at the edge of the data support, observations exist only on one side, so the local mean is pulled toward the interior. Local linear regression fits a slope as well as an intercept, which allows the fit to extrapolate more accurately to the boundary by accounting for the function's direction of travel. In RD designs, the key quantity is the fitted value at the cutoff (a boundary point), making this distinction critical.
Question 3 True / False
A wider bandwidth in local polynomial regression usually produces a better estimate because it uses more data.
TTrue
FFalse
Answer: False
Using more data is not inherently better when the data farther away contains misleading information for the target estimate. A wider bandwidth forces the local polynomial to approximate the true function over a larger range. If the true conditional expectation function is nonlinear, a wide-bandwidth local linear fit will be systematically biased toward a straight-line approximation. The optimal bandwidth explicitly trades variance reduction against bias increase — there is a bandwidth that minimizes MSE, and going beyond it increases total error even as variance keeps falling.
Question 4 True / False
Local polynomial regression fits a separate polynomial in a neighborhood around each evaluation point, rather than fitting a single polynomial to the entire dataset.
TTrue
FFalse
Answer: True
This is the defining feature of local polynomial regression and what makes it nonparametric. For each evaluation point x₀, the method collects nearby observations (within bandwidth h), weights them by a kernel function, and fits a polynomial to that local subset. A different polynomial is fit at each x₀, so the resulting curve can flex to match the local shape of the data everywhere. This contrasts with global polynomial regression, which fits a single function to all observations and imposes a rigid global shape.
Question 5 Short Answer
Explain the bias-variance tradeoff in bandwidth selection for local polynomial regression. What happens as bandwidth shrinks toward zero, and what happens as it grows very large?
Think about your answer, then reveal below.
Model answer: As bandwidth shrinks toward zero, each local fit uses only observations very close to the evaluation point — variance explodes (tiny sample, noisy estimate) but bias approaches zero (no need to extrapolate across a wide range). As bandwidth grows very large, the local fit incorporates most of the data — variance falls but bias grows, because the local polynomial must approximate the true function over a wide range where it may curve significantly. Optimal bandwidth sits between these extremes, minimizing mean squared error = bias² + variance.
This tradeoff appears throughout nonparametric statistics and machine learning (e.g., the choice of k in k-nearest neighbors, or the kernel bandwidth in kernel density estimation). The insight is that adding observations near the target reduces noise but adding distant observations introduces approximation error. Cross-validation and plug-in methods find bandwidth by estimating where the bias² + variance curve reaches its minimum.