Questions: Confidence Intervals and Score Reporting Uncertainty
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A student scores 68 on a licensing exam with a cut score of 70. The SEM is 4. A supervisor says the student 'clearly did not meet the standard.' What is the critical flaw in this reasoning?
AThe supervisor should use a different cut score
BThe student's 95% confidence interval overlaps the cut score, meaning proficiency cannot be ruled out
CThe SEM only applies to scores above the mean
DThe student should be retested until they produce a consistent score
With an SEM of 4, a 95% CI around 68 extends roughly ±7.8 points — from about 60 to 76. The cut score of 70 falls comfortably within this interval, meaning the student's true score could plausibly be above or below the cut. The point estimate (68) should not be treated as a precise truth; it is an estimate surrounded by meaningful uncertainty. Decisions near any threshold deserve extra scrutiny precisely because measurement error makes classification at the margin unreliable.
Question 2 Multiple Choice
Why does IRT-based confidence interval construction generally outperform CTT-based construction at a test's cut score?
AIRT uses larger sample sizes to estimate the SEM
BIRT's information function produces narrower intervals where items are most discriminating, providing better precision at the cut
CIRT assumes no measurement error at the cut score
DCTT overestimates the true score for test-takers near the cut
CTT's SEM is a single constant applied uniformly across the entire score range — an approximation that ignores where the test is actually precise. IRT's information function varies across the ability scale, peaking where items discriminate best. A test designed around a particular cut score will have high information there, producing a narrower SE and tighter CI exactly where precision matters most for classification decisions. At extremes where the test has little targeting, the IRT-based interval is appropriately wider.
Question 3 True / False
In classical test theory, the standard error of measurement is the same for every test-taker regardless of where they score on the ability scale.
TTrue
FFalse
Answer: True
This is a defining feature — and known limitation — of CTT. The SEM is a population-level constant derived from reliability and score variance; it does not vary by individual ability level. In reality, most tests are more precise near the score distribution's center (where items are best targeted) and less precise at the extremes. IRT's information function addresses this by producing ability-specific standard errors. Knowing that CTT's constant SEM is an approximation is crucial to understanding when IRT-based intervals should be preferred.
Question 4 True / False
Reporting a confidence interval around a test score signals that the test has low validity.
TTrue
FFalse
Answer: False
Confidence intervals communicate measurement precision (how much random error surrounds the observed score), not validity (whether the test measures what it claims to measure). A highly valid test with substantial measurement error still warrants CIs — they are about the reliability of the score estimate. Professional standards from the APA and the Standards for Educational and Psychological Testing require CI reporting for consequential assessments precisely because all tests have some measurement error, and good practice requires making that uncertainty visible.
Question 5 Short Answer
Why should practitioners treat a test score as an estimate rather than a precise measurement, and what does a confidence interval communicate that a point score alone does not?
Think about your answer, then reveal below.
Model answer: Every observed score contains random measurement error and may differ from the test-taker's true score. A confidence interval shows the range within which the true score likely falls, making uncertainty explicit. A point score implies precision that doesn't exist; the CI models the epistemically correct interpretation that the score is a best estimate, not a truth.
This is the foundational insight of the topic. Observed scores are estimators, and like all estimators they carry variance. For a student near a high-stakes cut score, the point estimate provides false precision — it suggests a definitive classification that the measurement cannot actually support. The CI makes the classification boundary's uncertainty legible: if the interval overlaps the cut, both classifications are statistically plausible, and additional evidence or retesting should inform the decision.