Questions: Sensitivity Analysis and Robustness Checks
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A researcher estimates a job training program's effect on earnings: no controls gives +$2,000; with demographic controls +$1,900; with economic controls +$1,800; with all controls +$1,750. What does this pattern indicate about the result?
AThe estimate is fragile — each additional control reduces it, revealing severe omitted variable bias
BThe estimate is robust — it moves within a small, consistent range across plausible specifications, supporting the conclusion
CThe researcher should report only the no-controls estimate since it shows the largest, most significant effect
DAdding controls always mechanically reduces estimates; no conclusion about robustness can be drawn
A 'stable path' of estimates across reasonable specifications is the evidence sensitivity analysis is designed to produce. Moving from $2,000 to $1,750 across four specifications — all in the same direction, never near zero — is strong evidence the result doesn't depend on any single modeling choice. A fragile result would be one that crosses zero or changes sign with a minor adjustment. Option A misreads the pattern: a small, consistent decline with additional controls suggests mild selection bias that controls are correctly adjusting for, not fragility.
Question 2 Multiple Choice
After completing her analysis, a researcher runs 40 different model specifications and reports only the 3 that show statistically significant results. What is this practice called, and why is it problematic?
ARobustness checking — reporting the most significant results demonstrates the estimates hold up
BSpecification searching — it inflates false discovery rates and misrepresents the true evidential weight of the results
CSensitivity analysis — systematically varying specifications and selecting the clearest is exactly the point
DBootstrap inference — randomly sampling from 40 specifications approximates a bootstrap distribution
Running many specifications and reporting only the significant ones is specification searching (sometimes called 'p-hacking'). With 40 specifications, you would expect roughly 2 to show p < 0.05 by chance even under the null. Selectively reporting those 3 creates the appearance of evidence that doesn't exist. Sensitivity analysis is the *opposite*: it requires reporting all results across a pre-specified or transparent grid of choices, including those that do not support the hypothesis. The distinction between searching and analysis is transparency and pre-commitment.
Question 3 True / False
Sensitivity analysis and bootstrap inference both measure the same dimension of uncertainty in an econometric estimate.
TTrue
FFalse
Answer: False
They measure fundamentally different dimensions. Bootstrap inference quantifies sampling variability — how much estimates would vary if you drew a different sample from the same population. Sensitivity analysis quantifies specification variability — how much estimates move when you change modeling choices (controls, functional form, sample restrictions) while keeping the same data. A result can be robust to sampling noise (tight bootstrap CIs) but fragile to specification changes, or vice versa. A credible empirical paper needs both.
Question 4 True / False
An estimate that changes dramatically when a single control variable is added or removed provides weaker causal evidence than one that remains stable across many plausible specifications.
TTrue
FFalse
Answer: True
If a treatment effect estimate swings from economically significant to near-zero upon inclusion of one variable, the conclusion was 'hanging on' the exclusion of that variable. This suggests either strong confounding — the omitted variable was capturing much of the true variation — or model instability. Stable estimates across a range of reasonable specifications mean the conclusion does not depend on any particular modeling choice, which is what we want from credible causal inference.
Question 5 Short Answer
What is the key difference between sensitivity analysis and formal specification testing (e.g., Hausman tests, RESET tests), and why does sensitivity analysis provide information those tests cannot?
Think about your answer, then reveal below.
Model answer: Formal specification tests compare a base model to one specific alternative — they answer 'is this particular alternative model statistically distinguishable from mine?' They produce a binary pass/fail for that one comparison. Sensitivity analysis instead varies a *range* of modeling choices (different control sets, functional forms, sample restrictions) and tracks the path of estimates across that space. It answers 'does my conclusion hold across the neighborhood of reasonable models?' Formal tests cannot evaluate the full space of plausible specifications; sensitivity analysis can show whether any specification within that space would overturn the conclusion.
The additional value is that sensitivity analysis guards against a specific failure mode: a researcher who passes all formal tests but happened to choose the one specification where results are strongest. Formal tests would not detect this. Sensitivity analysis, done transparently across a pre-committed grid, makes that cherry-picking impossible — the full path of estimates is disclosed.