External validity refers to the extent to which research findings can be generalized beyond the specific participants, settings, and conditions of a particular study. Population validity asks whether findings generalize to the target population; sample validity examines representativeness of the sample. Researchers often sacrifice internal validity gains in laboratory settings for external validity in field studies. Meta-analyses and systematic replication across diverse populations, settings, and times strengthen confidence in the generality of effects.
Compare findings from high-internal-validity laboratory studies with lower-control field replications to see which effects replicate. Examine meta-analyses that test whether effects vary across participant demographics and study contexts.
External validity is about sample size (actually, it's about representativeness and whether findings generalize). A study must have high internal validity to have high external validity (actually, these often trade off).
Your prior work on sampling in psychology and internal validity threats gave you tools for evaluating whether a study's conclusions are trustworthy *within* the study — whether the effect you observed was real and not an artifact of confounds. External validity asks the orthogonal question: even if the effect is real, does it apply beyond this study's particular participants, setting, time, and methods? These are separate concerns, and they frequently pull in opposite directions.
The cleanest illustration of the tradeoff is the laboratory experiment. Random assignment to conditions — your primary tool for establishing internal validity — requires controlled settings that often diverge sharply from the real world. A fear-conditioning study conducted on 20-year-old psychology undergraduates in a quiet cubicle with tones and shocks may yield highly interpretable causal estimates (high internal validity) that tell us little about fear learning in children, in naturalistic settings, or under conditions involving socially meaningful threats (low external validity). This is not a criticism of laboratory research — it is an observation that internal and external validity serve different inferential purposes and are rarely maximized simultaneously in a single study.
Population validity is one component of external validity: do findings generalize from the study sample to the intended target population? The canonical problem is the psychology research participant pool: decades of research rested heavily on WEIRD samples (Western, Educated, Industrialized, Rich, Democratic), and subsequent cross-cultural replications revealed that many foundational findings — on conformity, moral reasoning, perception, even basic cognitive biases — vary substantially across populations. A sample's size matters less than its representativeness: a well-designed survey of 500 carefully stratified participants will generalize better than a convenience sample of 5,000 undergraduates, if the target population is the general adult public.
A second component is ecological validity — whether the study's procedures and settings resemble the conditions under which the phenomenon naturally occurs. Laboratory measures of aggression (shocking a confederate, blasting noise at another player) are distant proxies for real-world aggressive behavior, which occurs in specific relational, emotional, and contextual conditions. High ecological validity does not require naturalness for its own sake; it requires that the study's operationalizations capture the relevant features of the phenomenon as it occurs in its natural habitat. Studies high in ecological validity often sacrifice experimental control, which is why the field relies on systematic replication — running the same question multiple times with different populations, settings, and methods — as the strongest evidence for generalizability.
Meta-analysis is the statistical engine of external validity reasoning. When dozens of studies using different samples, methods, and settings produce similar effect size estimates, confidence in generalizability grows. When effect sizes are highly heterogeneous across studies, moderator analysis asks which features of studies (population characteristics, methodological choices, setting types) explain the variation — yielding a refined, conditional claim: "this effect holds when X is present and weakens when Y is present." This conditional generalization is more scientifically honest than claiming universal applicability, and more practically useful than abandoning the effect because it is not perfectly consistent. External validity is not a binary judgment but a map of where findings travel well and where they do not.