A researcher conducts a matched case-control study, pairing each case with a control of the same age and sex. She then analyzes the data using standard (unmatched) logistic regression. What is the main problem with this approach?
BThe analysis ignores the pairing structure, treating matched controls as if randomly selected and producing biased estimates
CThe matched variables (age, sex) will not appear in the output, so confounding is uncontrolled
DStandard logistic regression overfits when sample sizes are small
Matching creates a deliberate pairing structure that must be preserved in analysis. Standard logistic regression treats each observation independently, ignoring the matched-set structure — it effectively discards the design's bias-reduction benefit and produces inefficient, potentially biased estimates. Conditional logistic regression, which conditions on matched sets and compares exposure within each pair, is the correct method.
Question 2 Multiple Choice
A researcher studying smoking and bladder cancer matches each case to a control on age, sex, and whether the person has nicotine-stained fingers. What is the likely consequence of matching on nicotine-stained fingers?
AStatistical power will increase because a strong confounder has been controlled
BConfounding by nicotine will be eliminated, improving validity
COvermatching will occur — cases and controls will be similarly exposed, reducing the detectable exposure contrast
DThe matched analysis will require a larger matched set ratio to compensate
Nicotine-stained fingers is a proxy for the exposure (smoking), not an independent cause of bladder cancer. Matching on an exposure proxy creates overmatching: you select controls who are also heavy smokers, eliminating the very contrast in exposure needed to detect a causal effect. The result is an odds ratio biased toward the null — not because there is no association, but because you've designed it out of the study.
Question 3 True / False
Matching on a confounding variable in a case-control study eliminates the need for statistical adjustment of that variable in the analysis.
TTrue
FFalse
Answer: False
This is the most common misconception about matching. Matching controls confounding at the design stage by balancing the matched variable between cases and controls — but only if the matched analysis is used. The matched structure must be honored in analysis (via conditional logistic regression or stratified analysis). If an unmatched analysis is used instead, the matching provides no confounding control and can actually worsen bias by inducing artificial correlations.
Question 4 True / False
Matching in case-control studies is generally preferable to statistical adjustment because it controls confounding more mostly.
TTrue
FFalse
Answer: False
Matching trades statistical efficiency for confounding control in specific circumstances, but it is not uniformly superior. Overmatching — matching on variables associated with the exposure rather than independently with the disease — can reduce statistical power without reducing confounding. Matching also commits resources before the study and constrains which confounders can be addressed. Statistical adjustment via regression handles multiple confounders simultaneously without incurring overmatching risk.
Question 5 Short Answer
Why does matched data require a matched analysis, and what happens statistically if you ignore the matching?
Think about your answer, then reveal below.
Model answer: Matching creates correlated observations within matched sets — the case and its controls are alike on the matched variables by design, not by chance. Conditional logistic regression preserves this structure by making comparisons within each matched set. If you apply standard (unconditional) logistic regression, you treat all observations as independent, which ignores the built-in similarity within pairs. This inflates the apparent degrees of freedom, produces inefficient standard errors, and can bias the odds ratio estimate because the matching induces artificial associations between the matched variable and exposure status.
The intuition: matching 'removes' variation in the matched variable by design. The statistical model must be told this variation was removed deliberately. Conditional logistic regression models the conditional probability of case status given exposure within each matched set — effectively blocking the matched variable from acting as a confounder while using only the within-set exposure contrast to estimate the odds ratio.