Panel Data: Structure and Advantages

College Depth 84 in the knowledge graph I know this Set as goal
Unlocks 33 downstream topics
panel-data longitudinal repeated-measures unobserved-heterogeneity

Core Idea

Panel data (longitudinal data) tracks the same units (individuals, firms, countries) over multiple time periods, producing observations indexed by both unit i and time t. This two-dimensional structure allows researchers to control for time-invariant unobserved characteristics (individual fixed effects) that would cause omitted variable bias in cross-sectional regressions. The key decomposition is y_it = α_i + x_it'β + u_it, where α_i captures all stable unit-specific factors. Panels can be balanced (all units observed every period) or unbalanced (missing observations). The Hausman test helps decide between fixed and random effects specifications.

How It's Best Learned

Contrast the cross-sectional and panel estimates of the effect of union membership on wages — the panel estimate, controlling for worker fixed effects, is typically much smaller, illustrating that high-ability workers disproportionately select into unions.

Common Misconceptions

Explainer

Cross-sectional regression has a fundamental weakness you encountered in endogeneity: if the units you observe differ in some stable, unobserved way that also correlates with your treatment variable, your estimates are biased. Imagine estimating the wage premium for union membership. Union workers may systematically be higher-ability workers who would have earned more regardless. A cross-sectional regression comparing union and non-union workers cannot separate the union premium from selection — workers with better outside options may be more likely to join and also to negotiate higher wages. Panel data offers a different strategy: instead of comparing different people, compare the *same person to themselves* over time.

The model y_it = α_i + x_it'β + u_it formalizes this. The individual fixed effect α_i absorbs everything stable about person i — ability, family background, temperament, personality — regardless of whether you can measure any of it. Because α_i is constant over time, it cancels out when you look at changes within the same person. If you observe a worker before and after joining a union, their unobserved ability shows up identically in both observations and drops out of the comparison. What remains is the within-person variation in x_it (union status changed) and the within-person variation in y_it (wages changed), isolating the effect of the treatment from the stable confounders.

The two-dimensional structure (units i and time periods t) gives panel data its power through the decomposition of variation. Total variation in the data has two components: between variation (differences across units, like comparing different people) and within variation (differences within the same unit over time, like one person's changes). Fixed effects estimation uses only the within variation, making it immune to bias from time-invariant omitted variables — the α_i terms are eliminated. This is why the within estimator can be understood as applying OLS to the demeaned data: subtract each unit's time-average from every observation, and the fixed effects disappear.

The Hausman test helps navigate a fundamental choice: should α_i be treated as fixed parameters to be estimated (a fixed effects model), or as random draws from a population distribution that are uncorrelated with x_it (a random effects model)? Random effects is more efficient — it uses both within and between variation — but requires the strong assumption that the individual effects are uncorrelated with the regressors. If that assumption fails (the usual case when you're worried about omitted variable bias), random effects is inconsistent and fixed effects is required. The Hausman test checks whether the two estimates differ significantly, which would indicate that the random effects assumption is violated. Finally, note the key misconception: fixed effects removes time-invariant confounders, but time-*varying* omitted variables still cause bias — a promotion decision that precedes both union joining and wage growth would still confound your estimate even with panel data.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesAngle Pairs: Complementary, Supplementary, and VerticalParallel Lines and TransversalsCorresponding AnglesAlternate Interior AnglesTriangle Angle Sum TheoremExterior Angle TheoremTriangle Inequality TheoremSimilar Triangles: AA SimilaritySimilar Triangles: SSS and SAS SimilarityProportions in Similar TrianglesRight Triangle Trigonometry IntroductionTrigonometric Ratios ReviewRadian MeasureConverting Between Degrees and RadiansThe Unit CircleGraphing Sine and CosineGraphing Tangent and Reciprocal Trigonometric FunctionsDerivatives of Trigonometric FunctionsAntiderivativesIndefinite IntegralsBasic Integration RulesRiemann SumsDefinite Integral DefinitionProbability Density Functions and Continuous DistributionsCumulative Distribution FunctionsContinuous Random VariablesNormal DistributionCentral Limit TheoremConfidence Intervals for MeansZ-Tests and T-Tests for MeansOne-Sample Z-Test for MeansOne-Sample and Two-Sample T-TestsOne-Way ANOVAF-Test and Joint SignificanceR-Squared and Model FitMulticollinearityRobust Standard ErrorsPanel Data: Structure and Advantages

Longest path: 85 steps · 426 total prerequisite topics

Prerequisites (5)

Leads To (3)