A clinical trial has 30% missing outcome data. The analyst performs complete-case analysis, using only the 70% of patients with complete data. Under what condition is this approach unbiased?
AWhen the data are Missing at Random (MAR)
BWhen the data are Missing Completely at Random (MCAR) — missingness is unrelated to any variable, observed or unobserved
CComplete-case analysis is always unbiased because it uses real observed data
DWhen more than 50% of data are observed
Complete-case analysis is unbiased only under MCAR — when the probability of being a complete case is the same for everyone, regardless of their covariate values or outcomes. If sicker patients are more likely to drop out (MAR or MNAR), complete-case analysis is biased because the remaining patients are healthier than the original sample. MCAR is a strong assumption that is rarely plausible in clinical research. Even when unbiased, complete-case analysis is inefficient because it discards all partial information from incomplete cases.
Question 2 Short Answer
Single imputation (replacing each missing value with one predicted value, like the mean) produces unbiased point estimates under MAR. However, it still underestimates uncertainty. Why?
Think about your answer, then reveal below.
Model answer: Single imputation treats the imputed values as if they were observed — it ignores the uncertainty about what the true values were. The resulting dataset has the correct sample size but artificially low variability because every imputed value is the same predicted value rather than a range of plausible values. Standard errors computed from the singly-imputed dataset are too small, confidence intervals are too narrow, and p-values are too small. Multiple imputation corrects this by creating multiple plausible versions of the data, allowing the variability across imputations to quantify the uncertainty introduced by the missing data.
Rubin's rules formalize this: the total variance of an estimate after MI equals the within-imputation variance (average of the m variances) plus the between-imputation variance (variance of the m point estimates) scaled by (1 + 1/m). The between-imputation component is zero for single imputation, which is why it underestimates uncertainty.
Question 3 True / False
Data are Missing Not at Random (MNAR) when the probability of missingness depends on the unobserved value itself. For example, patients with severe depression are less likely to return for follow-up questionnaires. Multiple imputation under MAR assumptions will produce biased results in this scenario.
TTrue
FFalse
Answer: True
MI under MAR assumes that after conditioning on observed data, the missing values have the same distribution as the observed values. Under MNAR, the missing values are systematically different from what any model based on observed data would predict — depressed patients who drop out have worse scores than depressed patients who remain, even after adjusting for all observed variables. MI would impute scores that are too optimistic. MNAR requires sensitivity analyses (pattern-mixture models, selection models) that explicitly model the missingness mechanism, and these require untestable assumptions about the relationship between missingness and the unobserved data.
Question 4 Multiple Choice
A colleague uses 5 imputations for a multiple imputation analysis, arguing this is sufficient based on Rubin's original recommendation. Is this still considered adequate?
AYes — 5 imputations is always sufficient
BNo — current guidance recommends 20-50 or more imputations, especially when the fraction of missing information is high, to stabilize standard error estimates and p-values
CThe number of imputations does not affect the results
DOnly 1 imputation is needed if the imputation model is correct
Rubin's original recommendation of 3-5 imputations was based on efficiency of the point estimate, which stabilizes quickly. However, standard errors, p-values, and particularly confidence interval coverage require many more imputations to stabilize. With 5 imputations, the variability of the variance estimate across repeated analyses is substantial. Current best practice recommends at least 20 imputations as a baseline, with more (50+) when the fraction of missing information is high or when precise p-values are needed.