Multiple Imputation for Missing Data

Research Depth 188 in the knowledge graph I know this Set as goal
missing-data multiple-imputation MCAR MAR MNAR Rubin

Core Idea

Missing data is ubiquitous in health research — patients miss clinic visits, questionnaire items are left blank, lab values are not drawn. Simple approaches (complete-case analysis, single-value imputation) either waste data or underestimate uncertainty. Multiple imputation (MI) addresses both problems by creating m (typically 20-50) complete datasets, each with missing values replaced by plausible values drawn from the predictive distribution of the missing data given the observed data. Each dataset is analyzed separately with standard methods, and the results are combined using Rubin's rules, which properly account for both within-imputation uncertainty (sampling variability) and between-imputation uncertainty (uncertainty about the missing values). MI is valid under the Missing at Random (MAR) assumption — that missingness depends on observed data but not on the missing values themselves — which is weaker than the Missing Completely at Random (MCAR) assumption required by complete-case analysis.

Explainer

Missing data is not just an inconvenience — it is a structural problem that can invalidate study conclusions if handled incorrectly. The three missing data mechanisms defined by Rubin (1976) determine what is at stake. MCAR (Missing Completely at Random) means missingness is unrelated to any data, observed or unobserved — like a lab machine randomly failing. MAR (Missing at Random) means missingness depends on observed data but not on the missing values — sicker patients (identified by observed severity scores) are more likely to drop out, but among patients with the same severity, missingness is random. MNAR (Missing Not at Random) means missingness depends on the missing values themselves — patients with the worst outcomes are the ones who stop coming back.

Complete-case analysis — analyzing only subjects with no missing data — is valid only under MCAR. Mean imputation, last-observation-carried-forward, and other single-imputation methods either introduce bias or underestimate uncertainty (or both). Multiple imputation was developed to handle MAR data while properly quantifying the additional uncertainty caused by missingness.

The MI procedure has three steps. Imputation: a statistical model predicts missing values based on observed data, drawing from the predictive distribution to create m complete datasets. Each dataset has different imputed values, reflecting uncertainty about the true values. Analysis: each complete dataset is analyzed with the standard method (regression, survival analysis, etc.), producing m sets of estimates and standard errors. Pooling: Rubin's rules combine the m results. The pooled point estimate is the average of the m estimates. The total variance includes both the within-imputation variance (average of the m variance estimates) and the between-imputation variance (variance of the m point estimates, scaled by (1 + 1/m)). The between-imputation component captures exactly the uncertainty due to not knowing the missing values.

The imputation model is critical and must be at least as rich as the analysis model — it should include all variables in the analysis model, auxiliary variables correlated with the missing data or the missingness mechanism, and the outcome variable. An imputation model that omits important predictors will produce biased imputations. Modern implementations (mice in R, mi in Stata) use chained equations (MICE/FCS) that iterate through conditionally specified models for each variable with missing data, accommodating mixtures of continuous, binary, and categorical variables. The practical guidance is: include everything plausibly related to the missing data or the missingness mechanism, use at least 20 imputations, and perform sensitivity analyses for MNAR.

Practice Questions 4 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesAngle Pairs: Complementary, Supplementary, and VerticalParallel Lines and TransversalsCorresponding AnglesAlternate Interior AnglesTriangle Angle Sum TheoremExterior Angle TheoremTriangle Inequality TheoremSimilar Triangles: AA SimilaritySimilar Triangles: SSS and SAS SimilarityProportions in Similar TrianglesRight Triangle Trigonometry IntroductionTrigonometric Ratios ReviewRadian MeasureConverting Between Degrees and RadiansThe Unit CircleGraphing Sine and CosineGraphing Tangent and Reciprocal Trigonometric FunctionsDerivatives of Trigonometric FunctionsAntiderivativesIterated Integrals and Fubini's TheoremDouble Integrals in Cartesian CoordinatesDouble Integrals over Rectangular RegionsDouble Integrals in Polar CoordinatesDouble Integrals: Definition and SetupIterated Integrals and Fubini's TheoremDouble Integrals over Rectangular RegionsDouble Integrals over General RegionsApplications of Double Integrals: Area, Mass, and MomentsTriple Integrals in Cartesian CoordinatesTriple Integrals in Cylindrical and Spherical CoordinatesChange of Variables and the Jacobian DeterminantApplications of Triple Integrals: Volume and MassVector Fields and Their RepresentationsLine Integrals of Vector FieldsGreen's TheoremSurface Integrals and Flux of Vector FieldsSurface Integrals and Flux of Vector FieldsDivergence Theorem: Flux and OutflowDivergence TheoremElectric FluxGauss's LawConductors in Electrostatic EquilibriumCapacitance and CapacitorsDielectricsDielectric Constant and Relative PermittivityElectric Field Inside Dielectric MaterialsDielectric Materials and PolarizationDielectric Susceptibility and PermittivityEnergy Density in Electric FieldsElectric Current and Current DensityElectrical Resistance and ResistivityOhm's Law and Circuit ElementsElectromotive Force (EMF) and BatteriesKirchhoff's Circuit Laws: Voltage and CurrentDC Circuit Network Analysis MethodsTransient Response in RC CircuitsRC CircuitsLC and RLC CircuitsAC Circuits: FundamentalsImpedance and ReactanceAC Power and ResonanceElectromagnetic WavesThe Electromagnetic SpectrumBlackbody Radiation and Planck's LawPhotoelectric EffectThe Photon: Light as QuantaCompton ScatteringWave-Particle Dualityde Broglie WavelengthHeisenberg Uncertainty PrincipleWavefunction and the Born RuleThe Schrödinger EquationState Vectors and WavefunctionsQuantum SuperpositionQuantum EntanglementBell Theorem and Bell InequalitiesPostulates of Quantum MechanicsScattering TheoryIntroduction to Scattering TheoryPartial Wave Analysis in ScatteringSpin Angular MomentumElectron Spin and Intrinsic Magnetic MomentStern-Gerlach Experiment: Spin Quantization and MeasurementElectron Diffraction and Matter Wave PropertiesDavisson-Germer Experiment: Crystal Diffraction of ElectronsElectron Diffraction and Matter Wave InterferenceWavefunctions and Probability Density InterpretationQuantum Superposition and Linear Combinations of StatesQuantum Operators and ObservablesCanonical Commutation Relations and UncertaintyHeisenberg Uncertainty Principle and Measurement LimitsTime-Independent Schrödinger Equation and EigenvaluesHydrogen Atom in Quantum MechanicsSpectral Lines and Energy TransitionsSelection Rules for Atomic TransitionsLS and jj Coupling Schemes in Multi-Electron AtomsPauli Exclusion Principle and Antisymmetric WavefunctionsElectron Configuration and the Aufbau PrincipleThe Periodic Table and Atomic Electronic StructureThe Periodic TableElectron ConfigurationPeriodic TrendsIonization EnergyIonic BondingLewis StructuresResonance Structures and Delocalized ElectronsResonance and Formal ChargeMolecular Polarity and Dipole MomentsIntermolecular ForcesStates of Matter and Phase Changes: Melting, Boiling, and SublimationGas Laws and the Ideal Gas EquationGas Stoichiometry and Volume-Volume CalculationsThermochemistry and EnthalpyHeat Capacity and CalorimetryEntropy and Molecular DisorderSpontaneity and ΔGEntropy and Gibbs Free EnergyChemical EquilibriumAcid-Base ChemistryOrganic Reaction Mechanisms and Arrow PushingElectrophilic Addition to AlkenesAromaticity and BenzeneDNA StructureCentral Dogma of Molecular BiologyThe Genetic CodeDNA MutationsDNA Repair MechanismsCell Cycle Checkpoints and Cancer PreventionMitotic Spindle Checkpoint and Chromosome SegregationKinetochore Structure and FunctionMitochondria: Structure and FunctionCellular Respiration OverviewBacterial Metabolism OverviewAntibiotic Resistance MechanismsInfectious Disease EpidemiologyFoundations of EpidemiologyMeasuring Disease Frequency: Incidence and PrevalenceEpidemiologic Study DesignsStudy Design in BiostatisticsLogistic Regression in BiostatisticsMixed-Effects Models in BiostatisticsMultiple Imputation for Missing Data

Longest path: 189 steps · 940 total prerequisite topics

Prerequisites (3)

Leads To (0)

No topics depend on this one yet.