Questions: Confounding Variables and Internal Validity
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A study finds that cities with more hospitals per capita have higher death rates. A policy student concludes hospitals are harmful and proposes reducing hospital funding. What is the most likely confounding variable?
ACity population size — larger cities have more hospitals and more deaths in raw numbers
BIllness severity — sicker populations both seek hospital care and are more likely to die, making hospitals appear harmful
CHospital quality — better hospitals attract more patients who then die despite good care
DMedical staff shortages — fewer doctors per patient leads to both more hospitals and higher death rates
Illness severity is the classic confound here: sicker populations create demand for more hospitals AND are more likely to die regardless of care quality. The pre-existing illness explains both the number of hospitals and the death rate — it is a systematic alternative explanation for both variables. This is selection bias at the population level: the comparison groups (high-hospital vs. low-hospital cities) differ in population health before any hospital effect can be measured. The causal direction is reversed from what the naive interpretation suggests.
Question 2 Multiple Choice
A school tests a new reading curriculum by comparing students in the new program (Group A) to students from the previous year who used the old curriculum (Group B). Group A scores 15% higher. What is the primary threat to internal validity?
ARegression to the mean — Group A's scores are likely to drop back toward average next year
BHistory effects — other improvements (new teachers, school initiatives) may have occurred between the two years
CDemand characteristics — Group A students knew they were in the new curriculum and tried harder
DThe sample was too small — adding more students would have eliminated these confounds
Comparing across different years means any changes between years are potential confounds. History effects capture exactly this: external events (new teachers, school-wide initiatives, demographic shifts) that occurred between the two measurement periods could explain the score difference independently of the curriculum. Option D is a key misconception: more participants increase statistical power but do not remove confounds — confounding is a design problem, not a sample-size problem.
Question 3 True / False
A variable that is randomly distributed across participants in a study reduces internal validity by acting as a confound.
TTrue
FFalse
Answer: False
Random variation is noise — it increases variance in the dependent variable but does not threaten internal validity. A confound requires SYSTEMATIC variation correlated with the independent variable that provides an alternative causal explanation. If a variable (say, IQ) is randomly distributed across conditions due to random assignment, it adds noise but not bias — it cannot explain why one condition outperformed another, because IQ is equally distributed in both groups.
Question 4 True / False
Running a study with a larger sample size can eliminate confounding variables if the sample is large enough.
TTrue
FFalse
Answer: False
Confounding is a design problem, not a sample-size problem. A confound exists when the study design allows a variable to systematically differ between conditions. No matter how many participants you add, if more motivated students are assigned to the treatment condition, motivation remains confounded with treatment. More participants give you more power to detect a confounded effect — which is arguably worse, since you become more confident in a misleading conclusion. The solution is design: random assignment, matching, or other control procedures.
Question 5 Short Answer
Why does random assignment eliminate confounds, while statistical control (measuring and adjusting for known confounds) does not fully solve the problem?
Think about your answer, then reveal below.
Model answer: Random assignment distributes all participant characteristics — both measured and unmeasured — roughly equally across conditions by chance. No variable can systematically favor one group. Statistical control, by contrast, can only adjust for confounds you have already measured and correctly modeled. Unknown confounds, confounds you didn't think to measure, and confounds with complex relationships to the outcome cannot be adequately controlled statistically. Random assignment provides blanket protection against all potential confounds; statistical control addresses only the ones you already know about.
This asymmetry explains why experiments with random assignment are the gold standard for causal inference, while observational studies — even with extensive statistical controls — cannot fully rule out alternative explanations. Every observational study carries the caveat 'there may be unmeasured confounds.' A randomized experiment pushes that concern aside by design. The practical implication: design is the first line of defense against confounding, and statistical analysis is a secondary tool for variables that couldn't be fully randomized.