A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Test-Retest Reliability and Temporal Stability

Graduate Depth 99 in the knowledge graph ☐ I know this ☆ Set as goal

7topics build on this

505prerequisites beneath it

Classical Test Theory Foundations Correlation Coefficient +1 more→→Generalizability Theory and Multi-Faceted Reliability Split-Half Reliability and the Spearman-Brown Prophecy Formula

Core Idea

Test-retest reliability assesses score stability over time by administering the same test at two time points and correlating results. This method assumes the construct being measured remains stable. It is most appropriate for stable traits (personality, intelligence) rather than knowledge or skills that improve with practice.

How It's Best Learned

Compare test-retest correlations for different construct types (stable traits vs. abilities) and examine how retest intervals affect stability coefficients. Analyze when other reliability methods are more appropriate.

Common Misconceptions

High test-retest reliability guarantees validity. A test can be stable but not measure the intended construct. Also, the time interval between administrations significantly affects obtained correlations, requiring careful documentation.

Explainer

Classical test theory, your prerequisite, establishes that every observed score is a composite of true score plus random error. Reliability, in that framework, is the proportion of score variance that is true-score variance — a signal-to-noise ratio. But there are multiple ways that noise can enter measurement, and each reliability method targets a different source. Internal consistency (alpha) asks whether items are measuring the same thing right now. Test-retest reliability asks a different question entirely: does the measurement give the same answer at different points in time? It targets a specific noise source — temporal instability — and is the appropriate reliability estimate when the construct you are measuring is supposed to be stable.

The method is straightforward: administer the same instrument to the same people twice, separated by a time interval, then compute the correlation between the two sets of scores. This correlation coefficient is the stability coefficient. A coefficient of 0.85 means that 85% of score variance at time 2 is predictable from time 1 scores — the remaining 15% represents either random measurement error or genuine change in the construct. The interpretation hinges entirely on a theoretical claim: if you believe the construct is a stable trait, low test-retest reliability is a problem with the measurement. If you believe the construct changes over the interval, then low test-retest reliability may reflect real change rather than measurement failure.

This is why construct type determines whether test-retest is the right reliability strategy. Personality traits like extraversion or neuroticism are theorized to be stable across months and years — test-retest reliability over a six-month interval is a meaningful criterion for a personality measure. But a measure of current anxiety state, by design, should fluctuate as circumstances change — using test-retest over a two-week interval would not reveal measurement error so much as genuine temporal change. For skills that improve with practice — reading speed, arithmetic fluency — test-retest over any interval conflates reliability with learning, making the stability coefficient difficult to interpret. The safest approach for learning-sensitive constructs is to use alternative forms rather than identical retest.

The retest interval is the most consequential methodological decision in test-retest studies. Very short intervals (hours, days) inflate reliability estimates through carry-over effects: participants remember their previous responses and anchor to them, producing artificial consistency that does not reflect true stability. Very long intervals (years) deflate estimates through genuine developmental or environmental change. The "right" interval depends on the construct's theoretical rate of change. For intelligence tests, intervals of 1-6 months are common; for clinical state measures, 1-2 weeks is typical; for personality, 6 months to a year is informative. Research reports should always specify the interval, because a correlation of 0.85 over two weeks and 0.85 over two years communicate entirely different things about temporal stability.

A final subtlety connects test-retest to the broader reliability framework. Test-retest reliability and internal consistency can dissociate substantially, and both can be high while validity remains low. A personality scale might correlate 0.90 with itself over six months (highly stable) while correlating 0.20 with actual behavior in personality-relevant situations (low validity). Stability proves that the test measures something consistently over time; it does not prove that the something is the construct you intend to measure. Reliability is a necessary but not sufficient condition for validity — a lesson that applies with particular force to test-retest, where the seductive appeal of a high stability coefficient can mask a fundamentally mismeasured construct.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Dividing Integers → Unit Rates → Proportions → Percent Concept → Converting Between Fractions, Decimals, and Percents → Operations with Rational Numbers → Two-Step Equations → Solving Multi-Step Equations → Equations with Variables on Both Sides → Angle Pairs: Complementary, Supplementary, and Vertical → Parallel Lines and Transversals → Corresponding Angles → Alternate Interior Angles → Triangle Angle Sum Theorem → Exterior Angle Theorem → Triangle Inequality Theorem → Similar Triangles: AA Similarity → Similar Triangles: SSS and SAS Similarity → Proportions in Similar Triangles → Right Triangle Trigonometry Introduction → Sine, Cosine, and Tangent Ratios → Trigonometric Ratios Review → Radian Measure → Converting Between Degrees and Radians → The Unit Circle → Graphing Sine and Cosine → Graphing Tangent and Reciprocal Trigonometric Functions → Derivatives of Trigonometric Functions → Antiderivatives → Indefinite Integrals → Basic Integration Rules → Riemann Sums → Definite Integral Definition → Probability Density Functions and Continuous Distributions → Cumulative Distribution Functions → Continuous Random Variables → Probability Density Functions → Expected Value → Weak Law of Large Numbers → Probability Axioms and Rules → Conditional Probability → Independence of Events → Sampling Distributions → Standard Error of Estimators → Hypothesis Testing: Framework and Logic → Classical Test Theory Foundations → Reliability and Validity: Foundational Relationship → Test-Retest Reliability and Temporal Stability

Longest path: 100 steps · 505 total prerequisite topics

Prerequisites (3)

Classical Test Theory Foundationshard Correlation Coefficientsoft Reliability and Validity: Foundational Relationshipsoft

Leads To (2)

Generalizability Theory and Multi-Faceted Reliabilitysoft Split-Half Reliability and the Spearman-Brown Prophecy Formulahard