Questions: Measurement Validity: Construct and Criterion Evidence
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A researcher develops a new anxiety scale with Cronbach's α = 0.94, indicating very high internal consistency. They conclude the scale must be highly valid. What is wrong with this reasoning?
ACronbach's α of 0.94 is too high — values above 0.90 indicate item redundancy, not validity
BReliability and validity are independent properties — a measure can be highly consistent while systematically measuring the wrong construct or measuring it only in specific populations
CInternal consistency does provide evidence of validity, so the conclusion is correct
DValidity requires test-retest reliability, not internal consistency
Reliability and validity are distinct. A scale can have high internal consistency (items correlate with each other) while measuring something other than the intended construct — or measuring the right construct only in the population it was developed in. The classic example: a bathroom scale that reads 10 lbs too heavy is perfectly consistent but systematically invalid. High Cronbach's α is necessary but not sufficient for validity; convergent, discriminant, and criterion evidence are also required.
Question 2 Multiple Choice
A depression scale with strong validity evidence in adult U.S. clinical samples is to be administered to adolescents in East Africa. Which statement best reflects the validity concern?
AThe scale is valid because its psychometric properties were rigorously established in the original context
BValidity is inherent to the test items, not the population, so the context change is irrelevant
CValidity evidence from one population and context does not automatically transfer; new evidence must be gathered for the new use or the inferential gap must be acknowledged
DThe scale should be completely redeveloped from scratch for any new cultural context
This is the central practical implication of the validity-as-use-specific principle. Validity evidence is not a permanent property of the test — it is evidence for specific interpretations of specific score uses in specific populations. Cultural context affects item interpretation, construct meaning, and criterion relationships. Option D overstates the requirement: cross-cultural adaptation and validation studies are possible without full redevelopment. But using the test without any additional validation is an inferential leap the evidence doesn't support.
Question 3 True / False
A measure can be highly reliable — producing consistent scores across administrations — while having poor validity for its intended purpose.
TTrue
FFalse
Answer: True
Reliability is a necessary but not sufficient condition for validity. A measure can consistently assess something real, just not the thing it's supposed to measure. A test that reliably measures vocabulary knowledge might be consistently administered as an 'intelligence test' while having poor construct validity for intelligence. Reliability sets an upper bound on validity (an unreliable measure cannot be valid), but high reliability doesn't guarantee high validity.
Question 4 True / False
A single study showing that a new personality measure correlates r = 0.75 with an established gold-standard measure is sufficient to establish the new measure's validity.
TTrue
FFalse
Answer: False
Validity is cumulative and argument-based — it is assembled through multiple lines of evidence over time, not established in a single study. A high convergent validity coefficient is one piece of evidence, but you also need discriminant validity (the measure doesn't correlate too strongly with unrelated constructs), content coverage, criterion validity (it predicts relevant real-world outcomes), and evidence that the validation generalizes to the populations and uses intended. No single coefficient 'validates' a measure.
Question 5 Short Answer
Why is the statement 'this test is valid' technically imprecise, and how should validity claims be framed instead?
Think about your answer, then reveal below.
Model answer: Validity is not a fixed property of the test — it is a property of the interpretations and uses made from test scores in specific contexts. The same test can have strong validity evidence for one purpose (screening clinical adults for major depression) and weak or absent evidence for another (assessing adolescent depression across cultural contexts). A precise validity claim specifies: the score interpretation, the construct being measured, the population, and the purpose. For example: 'The interpretation of PHQ-9 scores as indicating depressive symptom severity in adult primary care patients in the U.S. has strong validity evidence across multiple populations and criterion outcomes.'
This framing matters practically: when a test is used for high-stakes decisions (clinical diagnosis, employment, educational placement) in a population it was not validated for, the validity evidence does not transfer automatically. The user bears responsibility for ensuring adequate validity evidence exists for their specific use case.