Questions: Panel Data: Structure, Notation, and Advantages
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A researcher studies whether a job training program raises wages. Workers who choose to participate may be more motivated than those who don't. She has annual wage data for the same 500 workers over 8 years before and after the program. Why is this panel data structure particularly valuable for her research question?
A500 workers × 8 years = 4,000 observations, giving more statistical power than a single survey
BThe panel allows comparing each worker to themselves over time, controlling for time-invariant characteristics like innate motivation that would otherwise confound the estimate
CPanel data eliminates all sources of omitted variable bias, including time-varying confounders
DThe repeated observations allow the researcher to take the average wage for each worker, reducing measurement error
The key advantage is the within-unit comparison. Workers who choose training may have higher motivation — an unobservable characteristic that also raises wages independently of training. In cross-sectional data, this confounds the estimate. With panel data, you can compare each worker's wage before and after training. Time-invariant characteristics like innate motivation are constant for each worker and cancel out in this within-person difference. This is the logic of fixed-effects estimation: it controls for all time-invariant heterogeneity, observed or not.
Question 2 Multiple Choice
An unbalanced panel dataset differs from a balanced panel in that:
AThe unbalanced panel has more cross-sectional units than time periods
BSome units are not observed in every time period — there are gaps in the (i, t) grid
CThe time intervals between observations are unequal in length
DThe unbalanced panel cannot be used with fixed-effects estimators
A balanced panel has every unit (i) observed in every time period (t), giving exactly N × T observations. An unbalanced panel has missing entries — firms that exit, survey respondents who drop out (attrition), or countries that gain independence mid-sample. The balanced/unbalanced distinction matters for estimation because some software routines and some theoretical results assume balance. Unbalanced panels can still be used with fixed-effects, but care is needed about which observations identify the within-unit variation.
Question 3 True / False
The key advantage of within-unit variation in panel data is that it controls for unobserved characteristics that are constant over time for each unit — characteristics that would corrupt a cross-sectional comparison.
TTrue
FFalse
Answer: True
True. Fixed-effects estimation exploits within-unit variation — how each unit changes over time — rather than between-unit variation (how units differ from each other). By doing so, it implicitly holds constant any characteristic that does not change within a unit over the observation window: industry type for firms, country geography, individual innate ability. These time-invariant unobservables are the prime source of confounding in cross-sectional studies, and panel data's within structure neutralizes them without requiring measurement.
Question 4 True / False
Collecting panel data on the same individuals over multiple years automatically eliminates most sources of omitted variable bias from a regression.
TTrue
FFalse
Answer: False
False. Panel data with fixed effects eliminates bias from time-invariant omitted variables — factors that are constant for each unit over the observation window. It does not address time-varying omitted variables: factors that change over time within units and are correlated with the treatment. For example, if workers who receive job training also change their effort level simultaneously, the within-estimator still conflates training and effort effects. Panel data is powerful but not a complete solution to endogeneity.
Question 5 Short Answer
A researcher uses a single cross-sectional survey to compare wages of workers who received job training versus those who did not. Why might this comparison be misleading, and how would panel data help?
Think about your answer, then reveal below.
Model answer: In cross-sectional data, workers who select into training may systematically differ from those who don't in ways the researcher can't observe — for example, higher motivation, stronger work ethic, or better social networks. These unobserved characteristics independently raise wages, making the trained group look better off even if training had no effect. Panel data helps by tracking the same workers before and after training. By comparing each worker's wage change rather than comparing trained and untrained workers at a single point, within-unit fixed-effects estimation holds constant all time-invariant characteristics. Only the within-person variation — what changed for each individual — identifies the training effect.
This is the classic selection-into-treatment problem. Workers are not randomly assigned to training; those who opt in may already be on a better wage trajectory. Cross-sectional comparison confounds the treatment effect with pre-existing differences. Panel data's within-unit differencing is the observational analog of a randomized experiment: it mimics the 'compare the same person with and without treatment' logic that random assignment achieves by design.