Questions: Standard Error of Measurement and Confidence Intervals
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A psychologist reports: 'This test has a reliability of 0.92 — one of the best on the market — so I'm confident the score of 68 precisely reflects this client's ability.' What important consideration is being overlooked?
AA reliability of 0.92 is not actually high enough for individual-level clinical decisions
BEven with high reliability, the SEM — which depends on both reliability AND the population standard deviation — defines a confidence interval around the score; the point estimate of 68 is still uncertain
CThe test should have been compared to a criterion measure before interpretation
DReliability coefficients above 0.90 can be trusted for individual scores without further qualification
High reliability shrinks the SEM but does not eliminate it. SEM = SD × √(1 − r_xx), so a test with r_xx = 0.92 and SD = 15 has SEM = 15 × √0.08 ≈ 4.2. The 95% CI is roughly 68 ± 8, i.e., 60 to 76. This is a 16-point range — wide enough to matter in many clinical decisions. The error is treating high reliability as equivalent to high precision at the individual level, when in fact the SD of the population is an equally important factor.
Question 2 Multiple Choice
Test A has reliability r_xx = 0.90 and population SD = 15. Test B has reliability r_xx = 0.90 and population SD = 5. How do their SEMs compare, and what does this mean practically?
AThey have identical SEMs because they have identical reliability coefficients
BTest A has a larger SEM (≈ 4.7) than Test B (≈ 1.6); scores on Test A have wider confidence intervals even though both tests are equally reliable
CTest B has a larger SEM because its narrower score distribution makes individual scores less stable
DSEM cannot be compared across tests with different SDs
SEM = SD × √(1 − r_xx). For Test A: 15 × √0.10 ≈ 4.74. For Test B: 5 × √0.10 ≈ 1.58. Same reliability, very different precision at the individual score level. This is why SEM is the relevant metric for score interpretation — it is in the metric of the test itself, and it reflects how wide the confidence interval around any particular score will be. Reliability alone tells you the proportion of variance explained by true score, but not how large the measurement error is in the units that matter for the decision.
Question 3 True / False
A student scores 72 on a test with SEM = 5. A student who scores 76 on the same test cannot be reliably distinguished from the first student on the basis of these scores alone.
TTrue
FFalse
Answer: True
The 95% CI for the first student is approximately 72 ± 9.8 (63–82); for the second, approximately 76 ± 9.8 (66–86). These intervals overlap substantially. The 4-point gap between the scores is well within the range of measurement error and cannot be treated as a meaningful difference. This is the core practical lesson of SEM: apparent score differences that lie within the confidence interval are statistical noise, not real differences in ability or whatever construct the test measures.
Question 4 True / False
A large SEM indicates that the test is unreliable.
TTrue
FFalse
Answer: False
This is the key misconception identified in this topic. SEM = SD × √(1 − r_xx), so SEM depends on both reliability (r_xx) and the spread of scores in the population (SD). A test administered to a highly heterogeneous population with SD = 30 could have SEM = 9.5 even with reliability = 0.90 — a large SEM from a large SD, not from low reliability. Conversely, a test with genuinely low reliability administered to a narrow-ability group might show a small SEM simply because SD is small. You cannot infer reliability from SEM alone.
Question 5 Short Answer
Why should high-stakes cutoff decisions — such as classifying a student for special education based on IQ below 70 — always be reported as confidence intervals rather than point scores?
Think about your answer, then reveal below.
Model answer: Because a point score is a single draw from a distribution of possible scores, and the width of that distribution (determined by the SEM) is substantial even for highly reliable tests. A student who scores 72 with SEM = 4 has a 95% confidence interval of approximately 64–80 — a range that spans both sides of the IQ = 70 threshold. Treating 72 as a precise, accurate measurement and making an irreversible classification decision on that basis ignores the inherent imprecision of the measurement. Reporting the interval makes the uncertainty visible to decision-makers and reduces the probability of misclassification due to measurement error.
This is especially critical at classification cutoffs because the consequences of false positives and false negatives are asymmetric and large. The SEM does not change the test score, but it changes the appropriate level of confidence in any decision made on the basis of that score. Best practice is to report both the point estimate and the interval, to acknowledge that two scores within one or two SEMs of the cutoff are statistically indistinguishable from it, and to use multiple sources of evidence rather than a single score when the stakes are high.