Questions: Missing Data Mechanisms, Patterns, and Handling Methods
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A longitudinal clinical trial finds that participants with the worst symptoms are most likely to drop out before the final assessment. What is the missingness mechanism, and what is the consequence for a listwise-deletion analysis?
AMCAR — dropout is random and listwise deletion yields unbiased estimates
BMAR — missingness is related to observed variables and can be corrected by controlling for baseline severity
CMNAR — missingness is related to the unobserved missing values themselves, and listwise deletion will make outcomes look better than the true population
DMNAR — but because completers are a large enough sample, estimates remain valid
When dropout depends on the variable being measured (current symptom severity, which is unobserved at the missing time points), the mechanism is MNAR. Listwise deletion retains only completers — who by definition have less severe symptoms — creating a sample that is systematically healthier than the intended population. No statistical technique fully corrects MNAR without additional data or assumptions. Option D is the dangerous 'large sample = no bias' misconception; sample size affects precision, not the direction of systematic bias.
Question 2 Multiple Choice
A researcher uses listwise deletion in a study where income data is missing more often for lower-education participants, but education is fully observed for all participants. Under what condition would listwise deletion still produce unbiased estimates?
AIf the proportion of missing data is below 10%
BIf the sample size is large enough to preserve statistical power
COnly if the data are MCAR — that is, missingness is unrelated to any observed or unobserved variable
DIf the researcher controls for education in the regression model
Listwise deletion is unbiased only under MCAR. In this scenario, missingness depends on education (an observed variable), making it MAR — not MCAR. Listwise deletion will bias income estimates because completers are not a random subset; they skew toward higher education. Options A and B address power, not bias — even a large study with 5% missing data can be severely biased if the mechanism is MNAR or MAR. Option D (controlling for education) is actually the MAR fix, but that requires a method like MI or FIML, not listwise deletion.
Question 3 True / False
Having a large sample size is sufficient to protect against bias caused by missing data, as long as the proportion of missingness is small.
TTrue
FFalse
Answer: False
Bias from missing data depends on the missingness mechanism, not the sample size or the proportion missing. Under MNAR, even 5% missingness can produce substantial systematic distortion regardless of how large the study is. Large sample size reduces random error (increases precision) but does not correct systematic bias. A biased large study just gives you a precise estimate of the wrong answer.
Question 4 True / False
Multiple imputation produces valid inferences under MAR because it replaces missing values with estimates drawn from a distribution based on observed data, and combines results across multiple completed datasets to propagate imputation uncertainty.
TTrue
FFalse
Answer: True
This accurately describes MI. Under MAR, the observed variables contain sufficient information to model the missing data distribution. By creating multiple completed datasets (rather than a single imputed dataset), MI correctly captures the uncertainty introduced by imputation in the final standard errors and confidence intervals — via Rubin's rules. A single-imputation approach would underestimate uncertainty by treating imputed values as known.
Question 5 Short Answer
Why is it impossible for any purely statistical method to fully correct for MNAR missingness without additional data or external assumptions?
Think about your answer, then reveal below.
Model answer: Under MNAR, the probability of a value being missing depends on the unobserved value itself. Since we never see the missing values, we cannot model or verify this relationship from the observed data alone. Any statistical correction would require assumptions about the unobserved distribution of missing values — assumptions that the data cannot confirm or refute. Without additional information (e.g., external validation data, sensitivity analyses with explicit MNAR models), these assumptions are untestable.
For contrast: under MAR, we can use observed variables to model who is missing and why, allowing MI or FIML to recover unbiased estimates. But under MNAR, the very data we need to model the missingness process is the data that is missing. This is why MNAR is the most serious validity threat — it requires investigators to use substantive knowledge and sensitivity analyses rather than statistical techniques alone.