Internal validity refers to the degree to which a study can demonstrate a true causal relationship between an independent and dependent variable, free from confounding influences. Threats to internal validity include history, maturation, testing, instrumentation, regression to the mean, and selection bias. Understanding these specific threats enables researchers to design controls that eliminate plausible alternative explanations for observed effects. Strong internal validity is essential for causal claims, though it may require trade-offs with ecological authenticity.
Study classic examples where internal validity is compromised (e.g., the Hawthorne effect, practice effects from pre-testing). Analyze published experiments to identify which validity threats were addressed and which remain.
Internal validity means the study is well-designed overall (actually, it specifically means causal conclusions are justified). A study with perfect internal validity automatically has high external validity (actually, gains in control often reduce generalizability).
From your study of experimental research design, you know that the logic of experimentation is to manipulate one variable while holding everything else constant, then attribute any resulting change in the outcome to the manipulation. Internal validity is the formal name for the degree to which that inference is justified — whether the observed change in the dependent variable was truly caused by the independent variable and nothing else. Every threat to internal validity is a specific alternative explanation: a plausible reason why the outcome might have changed even if the manipulation had no effect.
The most important threats to learn, and the ones you will encounter in published research, are: history (an external event occurred during the study that could explain the outcome — a news story breaks while you're measuring attitudes, or a school fire drill interrupts your experiment); maturation (participants naturally change over time regardless of your intervention — children get older, people get tired, a condition resolves spontaneously); testing effects (taking the pretest sensitizes participants to the topic or teaches them the answers, so gains on the posttest reflect learning from the test itself rather than the intervention); and instrumentation (the measurement procedure changes between assessments — observers recalibrate their rating standards, a scale loses calibration, or the same rater becomes more lenient over time).
Two more threats require particular attention because they are less intuitively obvious. Regression to the mean occurs because participants selected for extreme scores — the most depressed patients, the lowest-performing students — are partly selected for measurement error that pushed them to that extreme. On retest, their scores move toward the population mean regardless of any intervention. If you enroll only the highest-scorers on a pre-test and see lower scores afterward, regression may explain it entirely. Selection bias occurs when the groups being compared differ systematically before the manipulation begins — in a pretest-posttest design without random assignment, the treatment group may have been more motivated to begin with.
Controlled experiments address these threats primarily through random assignment, which distributes all known and unknown individual differences equally across conditions at baseline. But random assignment does not eliminate every threat — history, testing effects, and instrumentation can still operate. Each threat has corresponding design solutions: control groups to absorb history and maturation effects, Solomon four-group designs to separate testing effects from treatment effects, inter-rater reliability checks and standardized protocols to address instrumentation. The important skill is not memorizing the list of threats but diagnosing which ones are plausible in a specific study and evaluating whether the design actually rules them out.