Missing Data and Imputation Methods

Research Depth 215 in the knowledge graph I know this Set as goal
Unlocks 1 downstream topic
missing-data imputation data-quality bias-handling

Core Idea

Missing data can introduce bias and reduce precision. Data may be Missing Completely at Random (MCAR), Missing at Random (MAR), or Missing Not at Random (MNAR). Multiple imputation is valid under MAR if the missing data mechanism is correctly modeled. Sensitivity analyses assess robustness to MNAR scenarios.

Explainer

From your study of exposure measurement error and outcome misclassification, you know that imperfect data measurement introduces bias — the recorded value differs systematically from the true value. Missing data is a related but distinct problem: for some observations, no value is recorded at all. Like measurement error, missing data can bias results, reduce effective sample size, and undermine inference — but the solution depends entirely on understanding *why* the data is missing, not just that it is.

The three-way taxonomy of missing data mechanisms is the essential conceptual tool. Missing Completely at Random (MCAR) means missingness has no relationship to any variable, observed or unobserved — like randomly discarding blood samples in the lab due to a freezer malfunction. MCAR is the only case where simply discarding missing observations (complete-case analysis) produces unbiased estimates, though it wastes data and reduces precision. Missing at Random (MAR) — despite the confusing name — does not mean random; it means missingness depends only on *observed* variables, not on the unobserved value itself. For example, older patients may be more likely to have missing biomarker data, but conditional on age, the missing values are not systematically different from observed values. This is the crucial assumption for most imputation methods. Missing Not at Random (MNAR) means missingness depends on the value that is missing itself: patients with the highest blood pressure readings are most likely to skip follow-up visits, so missing blood pressure values are systematically higher than observed ones. MNAR is the most dangerous and most common scenario in practice, and it cannot be verified from the observed data alone.

Multiple imputation is the standard solution under MAR. Rather than filling in a single "best guess" for each missing value (single imputation, which underestimates uncertainty), multiple imputation creates M complete datasets by drawing M plausible values for each missing observation from a model of the missing data process. Each dataset is analyzed separately using standard methods, and results are combined using Rubin's rules. The key insight is that uncertainty about the imputed values is propagated through the analysis — the variance across imputations adds to, rather than hides, the uncertainty due to missingness. The imputation model should include all variables that will appear in the analysis model plus any auxiliary variables that predict missingness, to satisfy the MAR assumption as broadly as possible.

When MNAR is plausible, the honest response is sensitivity analysis rather than a single fixed answer. You posit different assumptions about how missing values differ from observed values — for example, "suppose missing cholesterol values are 10 mg/dL higher on average than the observed distribution" — and re-run the analysis under each scenario. If conclusions are robust across a range of MNAR assumptions, confidence grows. If conclusions flip under plausible MNAR scenarios, the study must acknowledge that the finding is not robust to the missing data structure. The connection to your multivariable regression prerequisite is direct: the imputation model is itself a regression model, and understanding which variables to include and how to specify it requires the same thinking as building any regression model correctly.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesAngle Pairs: Complementary, Supplementary, and VerticalParallel Lines and TransversalsCorresponding AnglesAlternate Interior AnglesTriangle Angle Sum TheoremExterior Angle TheoremTriangle Inequality TheoremSimilar Triangles: AA SimilaritySimilar Triangles: SSS and SAS SimilarityProportions in Similar TrianglesRight Triangle Trigonometry IntroductionTrigonometric Ratios ReviewRadian MeasureConverting Between Degrees and RadiansThe Unit CircleGraphing Sine and CosineGraphing Tangent and Reciprocal Trigonometric FunctionsDerivatives of Trigonometric FunctionsAntiderivativesIterated Integrals and Fubini's TheoremDouble Integrals in Cartesian CoordinatesDouble Integrals over Rectangular RegionsDouble Integrals in Polar CoordinatesDouble Integrals: Definition and SetupIterated Integrals and Fubini's TheoremDouble Integrals over Rectangular RegionsDouble Integrals over General RegionsApplications of Double Integrals: Area, Mass, and MomentsTriple Integrals in Cartesian CoordinatesTriple Integrals in Cylindrical and Spherical CoordinatesChange of Variables and the Jacobian DeterminantApplications of Triple Integrals: Volume and MassVector Fields and Their RepresentationsLine Integrals of Vector FieldsGreen's TheoremSurface Integrals and Flux of Vector FieldsSurface Integrals and Flux of Vector FieldsDivergence Theorem: Flux and OutflowDivergence TheoremElectric FluxGauss's LawConductors in Electrostatic EquilibriumCapacitance and CapacitorsDielectricsDielectric Constant and Relative PermittivityElectric Field Inside Dielectric MaterialsDielectric Materials and PolarizationDielectric Susceptibility and PermittivityEnergy Density in Electric FieldsElectric Current and Current DensityElectrical Resistance and ResistivityOhm's Law and Circuit ElementsElectromotive Force (EMF) and BatteriesKirchhoff's Circuit Laws: Voltage and CurrentDC Circuit Network Analysis MethodsTransient Response in RC CircuitsRC CircuitsLC and RLC CircuitsAC Circuits: FundamentalsImpedance and ReactanceAC Power and ResonanceElectromagnetic WavesThe Electromagnetic SpectrumBlackbody Radiation and Planck's LawPhotoelectric EffectThe Photon: Light as QuantaCompton ScatteringWave-Particle Dualityde Broglie WavelengthHeisenberg Uncertainty PrincipleWavefunction and the Born RuleThe Schrödinger EquationState Vectors and WavefunctionsQuantum SuperpositionQuantum EntanglementBell Theorem and Bell InequalitiesPostulates of Quantum MechanicsScattering TheoryIntroduction to Scattering TheoryPartial Wave Analysis in ScatteringSpin Angular MomentumElectron Spin and Intrinsic Magnetic MomentStern-Gerlach Experiment: Spin Quantization and MeasurementElectron Diffraction and Matter Wave PropertiesDavisson-Germer Experiment: Crystal Diffraction of ElectronsElectron Diffraction and Matter Wave InterferenceWavefunctions and Probability Density InterpretationQuantum Superposition and Linear Combinations of StatesQuantum Operators and ObservablesCanonical Commutation Relations and UncertaintyHeisenberg Uncertainty Principle and Measurement LimitsTime-Independent Schrödinger Equation and EigenvaluesHydrogen Atom in Quantum MechanicsSpectral Lines and Energy TransitionsSelection Rules for Atomic TransitionsLS and jj Coupling Schemes in Multi-Electron AtomsPauli Exclusion Principle and Antisymmetric WavefunctionsElectron Configuration and the Aufbau PrincipleThe Periodic Table and Atomic Electronic StructureThe Periodic TableElectron ConfigurationPeriodic TrendsIonization EnergyIonic BondingLewis StructuresResonance Structures and Delocalized ElectronsResonance and Formal ChargeMolecular Polarity and Dipole MomentsIntermolecular ForcesStates of Matter and Phase Changes: Melting, Boiling, and SublimationGas Laws and the Ideal Gas EquationGas Stoichiometry and Volume-Volume CalculationsThermochemistry and EnthalpyHeat Capacity and CalorimetryEntropy and Molecular DisorderSpontaneity and ΔGEntropy and Gibbs Free EnergyChemical EquilibriumAcid-Base ChemistryOrganic Reaction Mechanisms and Arrow PushingElectrophilic Addition to AlkenesAromaticity and BenzeneDNA StructureCentral Dogma of Molecular BiologyThe Genetic CodeDNA MutationsDNA Repair MechanismsCell Cycle Checkpoints and Cancer PreventionMitotic Spindle Checkpoint and Chromosome SegregationKinetochore Structure and FunctionMitochondria: Structure and FunctionCellular Respiration OverviewGlycolysisGlycolysis: Mechanism and RegulationPentose Phosphate PathwayFatty Acid Synthesis and RegulationCholesterol Synthesis and RegulationMembrane Lipids and LipoproteinsLipid Bilayer Structure and Amphipathic MoleculesThe Cell Membrane: Fluid Mosaic ModelCell Junctions: Adhesion and CommunicationEpithelial and Connective Tissue TypesBone Structure, Composition, and RemodelingSkeletal Joints and Movement MechanicsSkeletal Muscle Anatomy and ContractionCardiac Muscle Anatomy and PropertiesHeart Chambers, Septa, and ValvesBlood Vessel Structure and TypesHemodynamics: Pressure, Volume, and Flow RelationshipsVascular Physiology and HemodynamicsRenal Filtration and Tubular ProcessingFluid and Electrolyte Regulation and OsmolarityFluid Compartments, Electrolyte Balance, and Acid-Base RegulationMinerals and Trace Elements in Human NutritionDietary Guidelines, Reference Intakes, and Food PatternsNutrition Across the Lifespan: Pregnancy, Infancy, Childhood, and AgingSocial Determinants of HealthHealth Promotion and Behavior Change ModelsRisk Communication and Behavior ChangeHealth Behavior Change and Population Intervention StrategiesHealth Promotion Program Design and Behavior Change TheoriesHealth Communication, Message Design, and Audience EngagementHealth Literacy and Public Health CommunicationBiostatistics in Public HealthSurveillance System Performance MetricsScreening Programs and Diagnostic Test PerformanceDiagnostic Test Properties: Sensitivity and SpecificityOutcome Misclassification and Differential ErrorMissing Data and Imputation Methods

Longest path: 216 steps · 1211 total prerequisite topics

Prerequisites (3)

Leads To (1)