Questions: Selection Bias: Types and Sources in Epidemiologic Studies
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A hospital-based case-control study examines smoking as a risk factor for bladder cancer. Cases are bladder cancer patients; controls are hospitalized patients with appendicitis. Appendicitis is independently more common in smokers. Compared to the true odds ratio, what is the most likely result?
AThe odds ratio will be inflated because smoking is over-represented among cases
BThe odds ratio will be biased toward the null because smoking is also elevated in the control group, making cases and controls appear more similar than they truly are
CThere will be no bias because both groups are drawn from the same hospital
DThe odds ratio will be inflated because hospital patients are generally sicker and more likely to smoke
This is Berkson's bias. Controls are drawn from hospitalized patients with appendicitis — a condition independently associated with smoking. This inflates smoking prevalence in the control group, making it artificially similar to the cases. The result is a diluted (biased toward null) odds ratio for smoking-bladder cancer. Berkson's bias flows from selecting controls from a non-representative hospital population where multiple diseases and exposures cluster together.
Question 2 Multiple Choice
An occupational cohort study compares mortality in chemical plant workers to national mortality tables and finds significantly lower death rates among workers. The most accurate interpretation is:
AChemical plant employment is likely protective against mortality
BThe healthy worker effect — severely ill people disproportionately do not work, making the employed cohort healthier at baseline than the general population
CLoss-to-follow-up bias has inflated apparent worker survival
DThe national mortality tables are calibrated for a different age distribution
The healthy worker effect is a form of selection bias: the reference population (all people in national tables) includes severely ill people who cannot work, while the employed cohort systematically excludes them. The comparison group is therefore healthier at baseline, producing an apparent protective effect. This means occupational hazard ratios can be systematically underestimated when national mortality tables serve as reference — a design flaw, not a real protective effect.
Question 3 True / False
Selection bias, unlike confounding, can be corrected through statistical adjustment during data analysis if sufficient covariate data are collected.
TTrue
FFalse
Answer: False
This is the critical distinction. Confounding can sometimes be addressed analytically (via stratification, regression, matching) if confounders are measured, because the target population is represented in the data. Selection bias distorts the study sample itself — specific cells of the exposure-outcome table are under- or over-represented relative to the target population. No amount of statistical adjustment can recover information about people who were never properly included. The remedy must come at the design stage: community-based control sampling, intensive follow-up to minimize dropout.
Question 4 True / False
Selection bias requires that the probability of study inclusion differs jointly across both exposure and disease status — differential inclusion based on exposure alone (without involving disease) is not sufficient to produce a biased association estimate.
TTrue
FFalse
Answer: True
Precisely right. If selection depends only on exposure (say, exposed people are twice as likely to be recruited, regardless of disease status), the exposure-disease association is not distorted — the data are still internally valid, just externally generalizable to a restricted population. Selection bias occurs when the selection probabilities differ across the combined exposure-disease cells: P(selected|E=1,D=1)/P(selected|E=1,D=0) ≠ P(selected|E=0,D=1)/P(selected|E=0,D=0). This differential pattern is what distorts the observed odds ratio.
Question 5 Short Answer
Explain why selection bias cannot be corrected through statistical analysis after data collection, unlike some forms of confounding.
Think about your answer, then reveal below.
Model answer: Selection bias distorts which people are in the study — the data collected do not adequately represent all cells of the exposure-outcome table in the target population. Because people with certain exposure-outcome combinations are systematically missing or over-represented, there is no data to adjust. Confounding, by contrast, involves a third variable present in the data that can be adjusted away statistically if measured. With selection bias, the problem is in the denominator of your study — who was never there — not in a measured variable you can control for.
The metaphor is instructive: confounding is a problem with the data you have (a covariate you can model); selection bias is a problem with the data you don't have (people who should have been sampled but weren't, or were sampled at wrong rates). You can't regress your way out of a sample that doesn't represent the population you want to study.