Questions: Consequential Validity and the Social Consequences of Testing
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
An employment test has strong criterion validity — it accurately predicts job performance. However, it produces a 3:1 pass rate difference between demographic groups, and rejected minority candidates perform on the job as well as accepted majority candidates. Under a consequential validity framework, what does this pattern suggest?
AThe test is valid because its technical predictive properties are sound
BThe adverse impact is evidence against the validity of the test for its intended use — the construct being predicted is not what actually drives the differential pass rates
CFairness concerns are separate from validity and should be addressed through HR policy rather than test evaluation
DThe predictive validity coefficient should be recalculated separately for each demographic group
Under Messick's framework, validity is not purely a technical property of the instrument — it includes whether the test's use produces defensible consequences. Here, rejected minority candidates perform as well as accepted majority candidates on the actual job, which means the test is screening them out for reasons unrelated to job performance. That is evidence that the test interpretation is not justified for this use. Option A makes the classic pre-Messick mistake: treating technical accuracy as sufficient for validity while treating harmful consequences as a separate ethical matter.
Question 2 Multiple Choice
What did Messick's conception of validity most fundamentally change about how test quality is evaluated?
AIt replaced reliability as the primary criterion, shifting emphasis from consistency to accuracy
BIt included the social consequences of test use as integral to the validity argument, not as a separate ethical concern
CIt required demonstration of multiple forms of criterion validity before a test could be deployed in high-stakes settings
DIt limited the permissible uses of standardized tests in educational and employment contexts
Messick's key contribution was collapsing the boundary between 'technical validity' and 'ethical use.' Before Messick, a test could be declared valid based on internal properties (reliability, factor structure, criterion correlations), with consequences treated as a policy matter outside measurement science. Messick argued that validity is about whether a score interpretation is *justified for a particular use* — and if the use produces systematic harm or inequitable outcomes, that is evidence the interpretation is unjustified, i.e., not valid. This made fairness and consequence analysis part of validity evidence, not an optional ethical overlay.
Question 3 True / False
A technically reliable and accurate test can have poor consequential validity if its use systematically restricts the opportunities of particular groups in ways that are not justified by the construct being measured.
TTrue
FFalse
Answer: True
This is the core claim of consequential validity. Technical accuracy (the test measures what it claims to measure) is necessary but not sufficient. If a use produces labeling effects, curriculum narrowing, disproportionate special education placements, or adverse impact on demographic groups — and these effects are not justified by the construct — they constitute evidence against the validity of that interpretation and use. Messick's framework specifically requires that validity arguments address whether outcomes are defensible, not just whether the instrument is accurate.
Question 4 True / False
Consequential validity is primarily a criterion validity concern — it asks whether a test accurately predicts a specific criterion outcome like job performance or academic achievement.
TTrue
FFalse
Answer: False
Criterion validity is a technical property asking whether test scores correlate with an external criterion. Consequential validity is an entirely different question: what are the effects of using this test in this context on individuals, institutions, and society? These include labeling effects, curriculum distortion, resource allocation consequences, and differential impact on demographic groups. A test can have strong criterion validity and poor consequential validity simultaneously — for example, if it accurately predicts performance but systematically screens out groups in ways that perpetuate inequity.
Question 5 Short Answer
Why does Messick argue that the social consequences of testing are part of validity rather than a separate ethical concern?
Think about your answer, then reveal below.
Model answer: Because validity is not a property of the test instrument itself but of a specific score interpretation for a specific use. If an interpretation is used to make decisions about people, and those decisions produce systematic harm or inequitable outcomes that cannot be justified by the construct being measured, then the interpretation is not fully supported — that is, not fully valid. Treating consequences as external to validity (as the traditional view did) allowed psychometricians to declare a test 'valid' while ignoring whether it actually served its intended purpose without harm. Consequential validity integrates these concerns so that the burden of justification falls on the test's use, not just its technical construction.
The practical implication is that test developers and users share responsibility for monitoring and documenting consequences, not just technical properties. An adverse impact pattern, labeling effect, or curriculum-narrowing consequence requires response within the validity framework — not just a policy adjustment — because it constitutes evidence that the intended interpretation is not justified.