A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Missing Data: Mechanisms and Analytical Solutions

College Depth 109 in the knowledge graph ☐ I know this ☆ Set as goal

571prerequisites beneath it

Classical OLS Assumptions (Gauss-Markov)→

Core Idea

Missing data can be missing completely at random (MCAR), missing at random (MAR), or missing not at random (MNAR). The missingness mechanism determines whether listwise deletion is valid or whether imputation, inverse-probability weighting, or selection models are needed.

How It's Best Learned

Examine patterns of missing data. Use listwise deletion as a baseline, then try multiple imputation or IPW to see if conclusions change.

Explainer

Missing data is not just an inconvenience — it is a selection problem. When observations drop out of your dataset, the remaining sample may no longer be representative of the population you care about. Whether this matters depends entirely on *why* the data are missing, which is what the three standard mechanisms capture. Think of the missingness mechanism as a treatment assignment rule: what determined whether each observation's data was observed or not?

MCAR (Missing Completely at Random) means the probability of being missing is unrelated to both observed and unobserved variables. Imagine a lab assistant randomly drops 5% of blood sample vials — there is no systematic pattern to which samples are lost. Under MCAR, listwise deletion (dropping incomplete cases) produces an unbiased sample; you lose efficiency but not validity. MAR (Missing at Random) is more common and more nuanced: missingness depends on observed variables but, conditional on those variables, is unrelated to the unobserved outcome. For example, older survey respondents are less likely to report income, but conditional on age, the missing income values are not systematically different from the reported ones. Under MAR, listwise deletion is still biased because it throws away the information in the observed covariates, but methods that model the missingness process — like multiple imputation — can recover valid estimates.

MNAR (Missing Not at Random) is the hardest case: the probability of being missing depends on the missing value itself. High-income respondents systematically refuse to report income; severely depressed patients drop out of clinical trials. No standard statistical adjustment can fix MNAR without external assumptions, because you cannot distinguish "the data is missing" from "the data has a particular value" using only what you observe. You must either obtain the missing data through follow-up or build a selection model that jointly models the outcome and the missingness process with identifying assumptions.

Your OLS assumptions prerequisite is relevant here because missingness interacts directly with the sample selection requirement. OLS on a complete-case subsample is valid only if that subsample is representative of the full population — which requires MCAR or a carefully conditioned MAR assumption. The practical workflow is: first describe patterns of missingness (what variables predict whether an observation is missing?), then test sensitivity by comparing complete-case results to results from inverse-probability weighting (which re-weights observed cases by the inverse probability of being observed) or multiple imputation (which fills in missing values multiple times from a model, preserving uncertainty). If the conclusions change materially across methods, the missing data mechanism is doing real work and the choice of approach must be justified and reported.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Dividing Integers → Unit Rates → Proportions → Percent Concept → Converting Between Fractions, Decimals, and Percents → Operations with Rational Numbers → Two-Step Equations → Solving Multi-Step Equations → Equations with Variables on Both Sides → Angle Pairs: Complementary, Supplementary, and Vertical → Parallel Lines and Transversals → Corresponding Angles → Alternate Interior Angles → Triangle Angle Sum Theorem → Exterior Angle Theorem → Triangle Inequality Theorem → Similar Triangles: AA Similarity → Similar Triangles: SSS and SAS Similarity → Proportions in Similar Triangles → Right Triangle Trigonometry Introduction → Sine, Cosine, and Tangent Ratios → Trigonometric Ratios Review → Radian Measure → Converting Between Degrees and Radians → The Unit Circle → Graphing Sine and Cosine → Graphing Tangent and Reciprocal Trigonometric Functions → Derivatives of Trigonometric Functions → Antiderivatives → Indefinite Integrals → Basic Integration Rules → Riemann Sums → Definite Integral Definition → Probability Density Functions and Continuous Distributions → Cumulative Distribution Functions → Continuous Random Variables → Probability Density Functions → Expected Value → Weak Law of Large Numbers → Probability Axioms and Rules → Conditional Probability → Independence of Events → Sampling Distributions → Standard Error of Estimators → Hypothesis Testing: Framework and Logic → P-values and Statistical Significance → Effect Size and Practical Significance → Hypothesis Testing: Framework and Logic → Z-Tests and T-Tests for Means → One-Sample Z-Test for Means → One-Sample and Two-Sample T-Tests → Inference in Linear Regression → Prediction Intervals in Regression → Linear Regression Basics → Residuals and Goodness of Fit (R²) → Simple (Bivariate) OLS Regression → Classical OLS Assumptions (Gauss-Markov) → Missing Data: Mechanisms and Analytical Solutions

Longest path: 110 steps · 571 total prerequisite topics

Prerequisites (1)

Classical OLS Assumptions (Gauss-Markov)soft

Leads To (0)

No topics depend on this one yet.