Questions: True Score Theory and Measurement Error
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A student scores 85 on a standardized test with a reliability of .84 and a standard deviation of 15 (giving a standard error of measurement of 6). A psychologist interprets this result. Which interpretation is most consistent with classical test theory?
AThe student's true ability is exactly 85, with the reliability coefficient confirming the score's accuracy
BThe student would score 85 on every retest, since 84% of the variance is reliable
CThe observed score of 85 is an estimate of the true score, with uncertainty of roughly ±6 points — best interpreted as a range
DThe student's score is above average; the error term is irrelevant since the test is reliable enough
Classical test theory holds that X = T + E: the observed score is the true score plus random error. No single observed score can be equated with the true score. The SEM of 6 means that if the same person were tested many times, their scores would form a distribution with a standard deviation of about 6 points around their true score. The correct interpretation is a confidence interval (e.g., 85 ± 6), not a point estimate. High reliability reduces error but does not eliminate it — the error term always remains nonzero in practice.
Question 2 Multiple Choice
If measurement error in a test is truly random and uncorrelated with true scores, what does this imply about the average error term across many administrations of the test to the same person?
AThe average error will equal the reliability coefficient
BThe average error will systematically inflate observed scores toward the population mean
CThe average error will approach zero, because random errors cancel out across repeated measurements
DThe average error will equal the standard deviation of the observed scores
Random error, by definition, has an expected value of zero. Positive errors (lucky guesses, momentary focus) and negative errors (distractions, fatigue) are equally likely and cancel out on average. This is why averaging many measurements gives a better estimate of the true score — the error term shrinks toward zero while the true score component accumulates. This is also why reliability can be improved by adding more items: more items average out more error.
Question 3 True / False
A person's 'true score' in classical test theory refers to the actual, hidden ability level that the test is trying to uncover — a fixed, real quantity the person possesses.
TTrue
FFalse
Answer: False
The true score is a statistical construct, not a metaphysical reality. It is defined as the expected value — the mathematical average — of a person's observed scores across hypothetical infinite replications under identical conditions. It is what their scores would converge to with unlimited measurement, not a 'real' ability stored somewhere in their brain. This distinction matters because it frames reliability and error as properties of the measurement process, not of the person's 'actual' ability.
Question 4 True / False
Increasing the reliability of a test reduces the standard error of measurement, meaning individual scores become more precise estimates of the true score.
TTrue
FFalse
Answer: True
The formula SEM = SD × √(1 − r) makes this relationship explicit. As reliability (r) increases toward 1.0, the term √(1 − r) decreases toward zero, and the SEM decreases. A perfectly reliable test (r = 1) would have SEM = 0, meaning every observed score perfectly equals the true score. In practice, as reliability increases from .80 to .90, the SEM decreases by about 30% (for the same SD), substantially tightening the confidence interval around each observed score.
Question 5 Short Answer
Why is it incorrect to interpret an observed test score as a precise point estimate of ability, and what does the standard error of measurement tell us instead?
Think about your answer, then reveal below.
Model answer: An observed score always contains random error (X = T + E), so it is an imprecise sample from a distribution of scores the person could obtain. The true score is the center of that distribution, but any single observation may deviate from it. The SEM quantifies how spread that distribution is: it tells you how much a person's observed scores would vary across retests due to error alone, and it enables constructing a confidence interval (e.g., observed score ± 1 SEM for ~68% confidence). The score should be reported as a range, not a single number.
This has direct clinical and educational consequences. Reporting an IQ of 112 as a precise number suggests a precision the test cannot deliver. Best practice is to report it as a range (e.g., 107–117) and make decisions only when scores are meaningfully above or below a cutoff — not near the boundary where error could flip the classification. The SEM is also the basis for evaluating whether a change in score from one testing to the next reflects genuine change or just measurement fluctuation.