Validity is the degree to which a measure or study captures what it intends to. Construct validity asks whether an instrument measures the theoretical construct of interest (assessed via convergent and discriminant validity). Internal validity concerns whether causal inferences within a study are justified. External validity (generalizability) concerns whether findings apply beyond the study context. These different types of validity can conflict — maximizing control for internal validity can reduce ecological validity.
Evaluate a published psychological measure for construct validity by examining what it correlates with (convergent) and what it doesn't (discriminant). Then discuss whether lab findings would replicate in field settings.
You have already learned about reliability — whether a measure produces consistent results. Validity is the next and deeper question: is the measure actually capturing what we think it is? A bathroom scale is reliable if it gives the same reading each time, but if it's miscalibrated by five kilograms, it's reliable but not valid. In psychology, the gap between "what we measured" and "what we meant to measure" is often the central methodological problem.
Construct validity is typically what researchers mean when they ask whether an instrument is valid. A psychological construct — like "working memory capacity" or "self-esteem" — is a theoretical entity we can't observe directly. We operationalize it as a set of test items or tasks, then ask whether those items track the construct faithfully. Convergent validity provides positive evidence: the instrument should correlate with other accepted measures of the same construct. Discriminant validity provides negative evidence: it should not correlate strongly with measures of different constructs. Both patterns together support the inference that the instrument is measuring the construct of interest.
Internal validity asks a different question: within a study, do the results support the causal claim? If participants who received the intervention improved more than controls, could that difference be explained by confounds (systematic differences between groups) or by the independent variable? Strong internal validity comes from random assignment, control groups, and careful experimental design. This is why reliability (from your prerequisite) matters here — an unreliable measure adds noise that makes causal detection harder.
External validity — generalizability — is often sacrificed when researchers optimize for internal validity. A tightly controlled lab paradigm that eliminates every confound may produce a real causal finding that nonetheless doesn't occur in natural settings. This tradeoff is not a flaw; it reflects the fact that different study designs answer different questions. A lab experiment establishes that an effect can happen; a field study tests whether it does happen in the wild. The two together are more powerful than either alone.
The critical insight about validity that surprises most students is that it is not a fixed property of a test but a property of a test used with a specific population for a specific purpose. A depression scale validated on U.S. adults is not automatically valid for adolescents or adults in other cultures. Measurement invariance — whether the factor structure and item relationships hold across groups — must be empirically demonstrated. This is why "the test was published and validated" does not settle questions about appropriate use; it only begins them.