Questions: Reliability and Validity: Foundational Relationship
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A researcher develops an 'executive function' test with excellent test-retest reliability (r = 0.95). Validation studies show it correlates r = 0.90 with processing speed but only r = 0.30 with established executive function tasks. What does this demonstrate?
AThe test is both reliable and valid — high reliability proves it is measuring consistently
BThe test is reliable but not valid — it consistently measures processing speed, not executive function
CThe high reliability sets a ceiling on validity, mathematically explaining the low validity coefficient
DValidity cannot be assessed without knowing the reliability of the criterion measures
This is the classic demonstration that reliability is not sufficient for validity. The test is highly consistent (r = 0.95 test-retest) but consistently measuring the wrong thing — processing speed, not executive function. A miscalibrated scale is analogous: it gives the same wrong reading every time. Option C misapplies the ceiling concept; the ceiling (√0.95 ≈ 0.97) is not what limits the validity — the test is simply measuring a different construct.
Question 2 Multiple Choice
A cognitive ability test has a reliability coefficient of r_xx = 0.64. What is the theoretical maximum validity coefficient it could possibly achieve against any external criterion?
A0.64, since validity cannot exceed reliability
B0.80, the square root of the reliability coefficient
C1.00, since validity is conceptually independent of reliability
D0.41, the square of the reliability coefficient
The attenuation formula sets the validity ceiling at √(r_xx · r_yy), and with perfect criterion reliability (r_yy = 1.0), the ceiling is √r_xx = √0.64 = 0.80. Unreliable test scores are too noisy to correlate strongly with anything. Option A is wrong: reliability and validity are measured differently — a validity coefficient can in principle exceed the reliability coefficient in a narrow sense, but the ceiling is √r_xx, not r_xx itself. Option C is the common misconception: thinking reliability and validity are independent.
Question 3 True / False
A test with near-zero test-retest reliability cannot be a valid measure of any stable psychological construct.
TTrue
FFalse
Answer: True
If a test is unreliable, its scores are dominated by random measurement error. Such scores cannot systematically reflect any stable construct — including the intended one. A test-retest correlation near zero means the same person's score changes substantially from one measurement to the next, which cannot reflect stable variation in the underlying attribute. The reliability coefficient places a mathematical ceiling on validity: √(near zero) ≈ near zero. Reliability is the floor, not the goal, but without it, validity is impossible.
Question 4 True / False
Achieving a very high internal consistency coefficient (e.g., Cronbach's α = 0.95) is sufficient evidence that a test is measuring the intended psychological construct.
TTrue
FFalse
Answer: False
High alpha means the items are strongly intercorrelated — they all measure the same thing consistently. But 'the same thing' might not be the intended construct. A collection of highly intercorrelated questions about fatigue, sleep, and appetite will yield high alpha while potentially measuring the somatic side effects of a medical illness rather than depression itself. Internal consistency is one form of reliability, and reliability is necessary but not sufficient for validity. Validity requires external evidence: correlations with theoretically related measures, predictions of relevant outcomes, and the full validity argument.
Question 5 Short Answer
Explain in your own words why reliability is a necessary condition for validity but not a sufficient one. Use a concrete analogy or example to illustrate the asymmetry.
Think about your answer, then reveal below.
Model answer: Reliability is necessary because unreliable scores (dominated by random error) cannot systematically reflect any construct — their ceiling on validity is near zero. But reliability is not sufficient because a test can consistently measure the wrong thing: e.g., a test of 'math ability' that reliably measures reading speed is useless for assessing math. High reliability tells you the test is measuring something stably; validity tells you whether that something is what you intended.
The classic analogy is a miscalibrated scale that reads 5 lbs too heavy on every weighing: highly reliable (same wrong answer every time) but not valid for knowing your true weight. In psychology, the head circumference example from phrenology illustrates this starkly — head size can be measured with excellent reliability but has essentially no validity as a measure of intelligence. Reliability is the floor; once achieved, validation work begins by examining whether scores relate to other measures in theoretically predicted ways.