Construct validity asks: Does the measure assess the intended construct? Evidence comes from content validity, convergent validity (correlates with related measures), discriminant validity (uncorrelated with unrelated measures), and factor structure. Criterion validity asks: Does the measure predict relevant outcomes? Both are integral to score interpretation and use.
Review validation studies for a psychological measure, extracting evidence of construct and criterion validity. Compare a measure with high internal consistency but low validity to understand that reliability ≠ validity. Practice evaluating whether a measure is valid for a new use.
Validity is often summarized as "does the test measure what it claims to measure?" but this framing obscures something important: validity is not a property of a test in isolation. It is a property of the interpretations and uses made from test scores. A depression measure might have strong validity evidence in clinical adult populations but poor validity when used with adolescents or in non-Western cultural contexts. From your study of reliability, you know that a measure can be highly consistent without measuring anything meaningful — a bathroom scale that consistently reads 10 pounds too heavy is reliable but systematically invalid.
Construct validity is the umbrella concept. It asks: does the pattern of relationships this measure forms with other variables make sense given our theoretical understanding of the construct? Evidence accumulates through multiple lines. Content validity evaluates whether the items cover the theoretical domain adequately — a math anxiety scale that only asks about algebra anxiety has poor content coverage if the construct is meant to encompass all mathematical domains. Convergent validity asks whether the measure correlates with other measures of the same or similar constructs; a new depression scale should correlate strongly with the BDI and PHQ-9. Discriminant validity (sometimes called divergent validity) asks the opposite: the measure should *not* correlate strongly with theoretically unrelated constructs. A depression scale with a .80 correlation with an anxiety scale raises questions about whether the two constructs are actually distinct.
Criterion validity is a separate but related question: does the measure predict relevant real-world outcomes? Concurrent validity examines correlation with a gold-standard criterion measured at the same time — does a new brief cognitive screening tool correlate with a full neuropsychological battery administered simultaneously? Predictive validity examines whether the measure predicts future outcomes — does a pre-employment personality scale predict actual job performance one year later? The distinction matters practically: a measure can have strong construct validity but weak predictive validity if the construct itself doesn't strongly cause the outcome you care about.
The unifying framework from contemporary psychometrics is that validity evidence is cumulative and argument-based. No single study "validates" a measure; rather, validation is an ongoing process of assembling a coherent validity argument — a chain of claims from test scores to interpretations to uses, with evidence supporting each link. When validity evidence is missing for a specific use case (a new population, a new purpose, a new context), the burden falls on the test user to either generate that evidence or acknowledge the inferential gap. This is why the phrase "this test is valid" is technically imprecise — the proper phrasing is always "the interpretation of these scores as measuring X in this population for this purpose has strong/weak validity evidence."