Missing Data: Mechanisms and Analytical Solutions

College Depth 75 in the knowledge graph I know this Set as goal
missing-data attrition imputation

Core Idea

Missing data can be missing completely at random (MCAR), missing at random (MAR), or missing not at random (MNAR). The missingness mechanism determines whether listwise deletion is valid or whether imputation, inverse-probability weighting, or selection models are needed.

How It's Best Learned

Examine patterns of missing data. Use listwise deletion as a baseline, then try multiple imputation or IPW to see if conclusions change.

Explainer

Missing data is not just an inconvenience — it is a selection problem. When observations drop out of your dataset, the remaining sample may no longer be representative of the population you care about. Whether this matters depends entirely on *why* the data are missing, which is what the three standard mechanisms capture. Think of the missingness mechanism as a treatment assignment rule: what determined whether each observation's data was observed or not?

MCAR (Missing Completely at Random) means the probability of being missing is unrelated to both observed and unobserved variables. Imagine a lab assistant randomly drops 5% of blood sample vials — there is no systematic pattern to which samples are lost. Under MCAR, listwise deletion (dropping incomplete cases) produces an unbiased sample; you lose efficiency but not validity. MAR (Missing at Random) is more common and more nuanced: missingness depends on observed variables but, conditional on those variables, is unrelated to the unobserved outcome. For example, older survey respondents are less likely to report income, but conditional on age, the missing income values are not systematically different from the reported ones. Under MAR, listwise deletion is still biased because it throws away the information in the observed covariates, but methods that model the missingness process — like multiple imputation — can recover valid estimates.

MNAR (Missing Not at Random) is the hardest case: the probability of being missing depends on the missing value itself. High-income respondents systematically refuse to report income; severely depressed patients drop out of clinical trials. No standard statistical adjustment can fix MNAR without external assumptions, because you cannot distinguish "the data is missing" from "the data has a particular value" using only what you observe. You must either obtain the missing data through follow-up or build a selection model that jointly models the outcome and the missingness process with identifying assumptions.

Your OLS assumptions prerequisite is relevant here because missingness interacts directly with the sample selection requirement. OLS on a complete-case subsample is valid only if that subsample is representative of the full population — which requires MCAR or a carefully conditioned MAR assumption. The practical workflow is: first describe patterns of missingness (what variables predict whether an observation is missing?), then test sensitivity by comparing complete-case results to results from inverse-probability weighting (which re-weights observed cases by the inverse probability of being observed) or multiple imputation (which fills in missing values multiple times from a model, preserving uncertainty). If the conclusions change materially across methods, the missing data mechanism is doing real work and the choice of approach must be justified and reported.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesAngle Pairs: Complementary, Supplementary, and VerticalParallel Lines and TransversalsCorresponding AnglesAlternate Interior AnglesTriangle Angle Sum TheoremExterior Angle TheoremTriangle Inequality TheoremSimilar Triangles: AA SimilaritySimilar Triangles: SSS and SAS SimilarityProportions in Similar TrianglesRight Triangle Trigonometry IntroductionTrigonometric Ratios ReviewRadian MeasureConverting Between Degrees and RadiansThe Unit CircleGraphing Sine and CosineGraphing Tangent and Reciprocal Trigonometric FunctionsDerivatives of Trigonometric FunctionsAntiderivativesIndefinite IntegralsBasic Integration RulesRiemann SumsDefinite Integral DefinitionProbability Density Functions and Continuous DistributionsCumulative Distribution FunctionsContinuous Random VariablesNormal DistributionClassical OLS Assumptions (Gauss-Markov)Missing Data: Mechanisms and Analytical Solutions

Longest path: 76 steps · 394 total prerequisite topics

Prerequisites (1)

Leads To (0)

No topics depend on this one yet.