External validity is the degree to which findings generalize beyond the specific participants, settings, and times studied. Laboratory experiments with convenience samples and artificial tasks often have lower external validity than naturalistic or community-based studies. Balancing internal and external validity requires strategic trade-offs: tight experimental control strengthens causal inference but may reduce applicability to real-world contexts.
Your prerequisite on internal validity established that a study has internal validity when its design supports a causal inference — when we can attribute the observed outcome to the manipulated variable rather than to confounds. But internal validity only answers "did X cause Y in this study?" External validity asks the harder follow-up: "so what?" — meaning, does the causal relationship found here hold in other places, with other people, at other times? A study can be perfectly internally valid and almost completely non-generalizable, and the history of psychology is full of cautionary examples.
There are three main threats to external validity. Population validity concerns whether findings generalize from the sample studied to other people. Psychology's most-criticized sampling problem is the WEIRD sample — participants from Western, Educated, Industrialized, Rich, and Democratic societies, often undergraduate students at research universities. Findings from such samples have repeatedly failed to replicate with different populations: the Mueller-Lyer illusion varies across cultures; conformity effects vary by individualist vs. collectivist contexts; even basic memory and perception phenomena show cross-cultural differences. Ecological validity concerns whether the laboratory setting captures the phenomenon as it operates in the real world. A memory study using lists of unrelated words is internally clean but may tell us little about how people remember personally meaningful events. Temporal validity concerns whether findings hold across time — social norms, technology, and cultural contexts change, and phenomena studied in one era may not replicate in another.
The central tension in research design is that the moves that maximize internal validity often threaten external validity, and vice versa. Random assignment to conditions, strict experimental control, standardized stimuli, and laboratory settings all increase confidence in causal inference but introduce artificiality. Naturalistic observation and field research capture behavior in its real context but sacrifice control. This is not a problem with a clean solution — it is a design trade-off that researchers navigate based on the question being asked. If you want to know *whether* a drug can work, a tightly controlled randomized trial is appropriate. If you want to know *whether* it works as prescribed in real clinical practice, you need effectiveness research in natural settings.
Replication is the scientific community's primary tool for establishing external validity over time. A single study, no matter how well designed, makes a narrow generalization claim. A finding that holds across multiple labs, diverse participant populations, varied operationalizations of the key construct, and different cultural contexts is far more likely to reflect a genuine phenomenon. The reproducibility crisis in psychology (many classic findings failed direct replication in large-sample attempts) renewed attention to external validity as distinct from internal validity — and reminded the field that p < .05 in one well-controlled study is only the beginning of the evidential story.