The standard error of measurement (SEM) quantifies individual score precision: SEM = SD√(1 - r_xx). It defines confidence interval width; a 95% CI is approximately ±1.96 × SEM. SEM allows clinicians and educators to communicate uncertainty and avoid over-interpreting small score differences. Communicating ranges rather than point estimates improves score interpretation and reduces misuse.
Calculate SEM for published tests and construct confidence intervals for individual scores. Graph how SEM varies with reliability coefficient to illustrate the precision trade-off.
Once you have a reliability coefficient for a test, the standard error of measurement (SEM) transforms that abstract statistic into something directly interpretable at the level of individual scores. The formula is SEM = SD × √(1 − r_xx), where SD is the standard deviation of scores in a reference population and r_xx is the reliability coefficient. You can see immediately from this formula that SEM has two determinants: how much scores vary across people (SD), and how unreliable the test is (1 − r_xx). A highly reliable test has a small SEM; an unreliable test has a large SEM even with a modest population SD. Critically, two tests can have the same reliability coefficient but different SEMs if their population SDs differ — the SEM is in the metric of the test itself.
The SEM is interpreted as the standard deviation of measurement error around an individual's true score. Under Classical Test Theory, if you could test the same person infinitely many times under identical conditions with no learning or fatigue effects, their observed scores would form a distribution centered on their true score, with standard deviation equal to the SEM. So if a student scores 85 on a test with SEM = 4, the 95% confidence interval around that score is approximately 85 ± (1.96 × 4), or roughly 77 to 93. The student's true score lies somewhere in that range with 95% confidence — and the point estimate of 85 is just one draw from that distribution.
The practical stakes of this become clear in high-stakes classification decisions. In school settings, two students who score 82 and 86 are often treated as meaningfully different. If the SEM is 5, however, those scores are statistically indistinguishable: confidence intervals overlap substantially, and the apparent gap lies well within the range of measurement error. Many consequential decisions — placing a student in special education, assigning a clinical diagnosis, setting a personnel cutoff — depend on a threshold score (e.g., IQ below 70). The SEM quantifies the uncertainty around that cutoff: a student who scores 72 with an SEM of 4 could plausibly have a true score anywhere from 64 to 80, which spans both sides of the threshold.
The practical upshot is a shift in how scores should be communicated and used: not as point estimates ("you scored 115") but as intervals ("your score is most likely between 109 and 121"). This framing is more statistically defensible and more protective against the systematic error of over-interpreting imprecise measurements as precise facts. SEM is the translation layer between the abstract reliability coefficient and the real-world question every score user actually wants answered: how much can I trust this particular number?