Questions: Measurement Reliability: Types and Estimation
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A researcher develops a new anxiety scale and finds that Cronbach's alpha = 0.62. The scale correlates r = 0.45 with a clinician-rated anxiety measure. What is the most accurate interpretation?
AThe scale is valid because r = 0.45 is a respectable correlation
BThe observed correlation is likely attenuated by measurement error; the true relationship could be considerably stronger
CThe scale must be invalid since alpha is below 0.70
DAlpha and the validity correlation are unrelated — reliability and validity measure different things independently
Low reliability attenuates observed correlations toward zero — the correction for attenuation formula shows that the maximum possible correlation with a perfectly reliable criterion is √(alpha) ≈ 0.79 for alpha = 0.62. The observed r = 0.45 likely understates the true relationship because measurement noise in the scale is diluting the signal. This illustrates why reliability sets a ceiling on validity: an unreliable scale cannot reveal the true strength of relationships even if the construct itself is theoretically sound.
Question 2 Multiple Choice
A researcher administers the same depression scale to participants twice, three weeks apart, and correlates the two sets of scores. What source of measurement error is this procedure designed to assess?
AError from sampling items — whether different items would produce the same scores
BError from rater subjectivity — whether different observers score the same behavior consistently
CError from temporal inconsistency — whether scores are stable over time in the absence of real change
DError from social desirability — whether participants answer honestly
Test-retest reliability specifically targets temporal instability as a source of measurement error. If the construct is stable (depression levels haven't changed in three weeks), any score difference reflects error — fluctuations in attention, mood on the test day, memory of prior responses, etc. This is distinct from internal consistency (alpha), which asks whether items within a single administration agree with each other, and inter-rater reliability, which targets observer disagreement. Each type of reliability isolates a different error source.
Question 3 True / False
Reliability sets a ceiling on validity: a measure cannot correlate more strongly with external criteria than its own reliability coefficient allows.
TTrue
FFalse
Answer: True
The correction for attenuation makes this mathematically explicit. The maximum possible correlation between two measures equals √(r_xx × r_yy), where r_xx and r_yy are the reliabilities of each measure. If a scale has alpha = 0.70, it can correlate at most √0.70 ≈ 0.84 with a perfectly reliable criterion. Measurement error in the scores systematically dilutes observed correlations toward zero. This is why reliability is prerequisite to validity — you must first establish that your measure is consistent before asking whether it measures what it should.
Question 4 True / False
A scale with high internal consistency (Cronbach's alpha = 0.92) is measuring a single, unified psychological construct.
TTrue
FFalse
Answer: False
High alpha reflects high average inter-item correlations, which can occur even when items tap multiple related but distinct factors — a condition called multidimensionality. Alpha is a measure of consistency, not unidimensionality. For example, a 20-item scale might have two clusters of 10 items each measuring different but correlated facets; the overall alpha could be high while the scale is actually bidimensional. Establishing unidimensionality requires factor analysis or other structural methods, not just inspecting alpha.
Question 5 Short Answer
Why does unreliable measurement systematically undermine scientific conclusions about whether a construct predicts outcomes, rather than simply making estimates less precise?
Think about your answer, then reveal below.
Model answer: Unreliable measurement introduces random error that attenuates observed correlations toward zero — it does not just add noise around the true value, it biases estimates of relationships downward. The correction for attenuation formula shows the true correlation is the observed correlation divided by the square root of the product of the two measures' reliabilities. This means a researcher using unreliable measures will routinely conclude that constructs are less related than they truly are, leading to false negatives and underestimates of effect sizes. Unreliability does not produce random over- and under-estimates that average out — it systematically suppresses observed relationships.
The asymmetry is crucial: random measurement error does average out for individual scores (the mean is unbiased), but it does NOT average out for correlations and regression coefficients. Those statistics are based on covariance, and random error in one or both measures reduces observed covariance. The practical implication is that any field using unreliable measures will systematically underestimate the predictive validity of its constructs, potentially dismissing theoretically sound variables as empirically weak when the problem is actually the measurement tool.