In a cohort study, patients with higher blood pressure values are more likely to miss their follow-up visit. Which missing data mechanism applies, and what is the appropriate analytic response?
AMCAR — complete-case analysis is valid because missingness is unrelated to outcomes
BMAR — multiple imputation using observed covariates will produce unbiased estimates
CMNAR — standard imputation methods cannot correct for this bias; sensitivity analysis is needed
DMAR — simply using the last observed value for missing visits is sufficient
Missingness here depends on the missing value itself (patients with the highest blood pressure are most likely to have missing blood pressure data) — this is MNAR by definition. Under MNAR, neither complete-case analysis nor standard multiple imputation (which assumes MAR) can be trusted to produce unbiased estimates, because the missing values are systematically different from observed ones in a way that cannot be modeled from observed data alone. Sensitivity analysis under different MNAR assumptions is the appropriate response.
Question 2 Multiple Choice
A study finds that older patients are significantly more likely to have missing biomarker data. After controlling for age, the probability of missingness is the same regardless of the biomarker value. Which mechanism applies?
AMCAR — because the missing values themselves are unrelated to the biomarker
BMNAR — because age predicts missingness and age is related to the outcome
CMAR — missingness depends on observed age but not on the unobserved biomarker value itself
DMCAR — any relationship between missingness and an observed variable disqualifies MNAR
This is the MAR scenario. MAR means: conditional on observed variables (here, age), the probability of being missing does not depend on the missing value itself. Older patients have higher missingness, but among patients of the same age, the missing biomarker values are not systematically different from observed ones. This satisfies MAR, making multiple imputation including age in the imputation model a valid approach. MCAR would require that even age had no relationship to missingness — a stronger assumption.
Question 3 True / False
'Missing at Random' (MAR) means that the missing observations are a random subset of most observations, similar to randomly discarding data.
TTrue
FFalse
Answer: False
MAR is one of the most poorly named concepts in statistics. It does NOT mean data are missing by random chance — that is MCAR. MAR means missingness depends only on *observed* variables, not on the value of the missing variable itself. A study where younger patients are systematically less likely to have lab results recorded is NOT MCAR (missingness is related to age), but it may be MAR if the missing lab values, conditional on age, are not systematically different from observed lab values. MCAR is a special case of MAR where no observed variable predicts missingness either.
Question 4 True / False
Under MCAR, discarding all observations with missing data (complete-case analysis) produces unbiased effect estimates, though it reduces sample size and statistical power.
TTrue
FFalse
Answer: True
This is the one scenario where complete-case analysis is unbiased. If data are MCAR, the complete cases are a random sample of the full dataset — there is no systematic selection that would distort effect estimates. The cost is efficiency: you lose observations and thus precision, widening confidence intervals. Under MAR or MNAR, complete-case analysis is biased because the complete cases are not a representative random sample of all observations.
Question 5 Short Answer
Why is MNAR impossible to verify from the observed data alone, and what is the appropriate analytic strategy when MNAR is plausible?
Think about your answer, then reveal below.
Model answer: MNAR means missingness depends on the missing value itself — but since those values are unobserved, you cannot use the data to test whether this relationship exists. Any model of missingness can only be fitted to observed data, leaving the MNAR mechanism unidentifiable. The appropriate strategy is sensitivity analysis: posit a range of plausible MNAR assumptions (e.g., 'missing values are X units higher than observed'), re-run the analysis under each, and assess whether conclusions change. If results are robust across plausible scenarios, confidence grows; if conclusions flip, the uncertainty must be reported.
This is a fundamental epistemological limit: the very data needed to test the MNAR assumption is the data that is missing. Unlike measurement error (where the magnitude of error can sometimes be estimated from validation substudies), MNAR mechanisms are inherently unverifiable without additional external information. Sensitivity analysis makes the uncertainty explicit rather than hiding it under an untestable assumption.