A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Measurement Reliability: Types and Estimation

College Depth 114 in the knowledge graph ☐ I know this ☆ Set as goal

6topics build on this

561prerequisites beneath it

Variables: Definition, Operationalization, and Measurement Correlation Coefficient→→Measurement Validity: Construct and Criterion Evidence Psychological Test Construction and Psychometric Validation

Core Idea

Reliability is consistency of measurement across items (internal consistency), raters (inter-rater), time (test-retest), or forms (parallel). Each type addresses different sources of error. Coefficient alpha, intraclass correlations, and test-retest correlations quantify reliability. Unreliable measurement attenuates relationships and reduces statistical power; reliability sets an upper bound on validity.

How It's Best Learned

Calculate Cronbach's alpha for a published scale. Review reliability coefficients in research papers and interpret their magnitude. Discuss which type of reliability (internal, test-retest, inter-rater) is most important for different measurement contexts.

Common Misconceptions

Reliability and validity are the same; - High internal consistency always indicates unidimensionality; - Alpha > 0.7 is sufficient for all uses; - One reliability estimate applies to all samples and times.

Explainer

From your work on operational measurement, you know that every construct must be defined in terms of observable indicators — the behaviors, responses, or outcomes that stand in for the underlying theoretical variable. The moment you operationalize, you introduce the possibility of measurement error: the gap between your observed score and the true score you are trying to capture. Reliability is the study of that gap — specifically, how consistent the observed score is across different conditions under which you would expect it to stay the same.

The most important conceptual anchor is Classical Test Theory's decomposition: Observed Score = True Score + Error. If you administer the same test to the same person twice under identical conditions, the true score should be the same both times. Any difference in observed score is error. Reliability is the proportion of variance in observed scores that reflects true score variance — formally, σ²_T / σ²_X. A reliability of 0.80 means 80% of the observed score variance is true variance and 20% is error. Different types of reliability target different sources of error.

Internal consistency (measured by Cronbach's alpha) asks: do the items on this scale all pull in the same direction? It targets error from sampling items — if you replaced half the items with other items measuring the same construct, would the scores stay the same? Alpha is computed from the average inter-item correlation and the number of items: longer scales with higher inter-item correlations yield higher alpha. The connection to your knowledge of correlations is direct — alpha is essentially a function of the average pairwise item correlation. The target of α > 0.70 is a rough heuristic; for high-stakes clinical decisions, you want α > 0.90 because lower reliability means individual scores could be far off. Test-retest reliability asks about stability over time — error from temporal inconsistency in measurement. Inter-rater reliability asks whether two independent judges produce the same score — error from observer subjectivity.

The most critical practical implication is that reliability sets a ceiling on validity. If a scale measures with error, the correlation between that scale and any external criterion is mathematically attenuated — reduced toward zero by the noise in the scores. The correction for attenuation formula makes this explicit: the maximum possible correlation between two measures equals the square root of the product of their reliabilities. A scale with alpha = 0.60 can correlate at most about 0.77 with a perfectly reliable criterion. Before asking "does this measure predict what it should predict?", you must ask "is this measure consistent enough that it could even detect a real relationship?" Unreliable measurement is not just imprecise — it systematically undermines the scientific conclusions you can draw.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Dividing Integers → Unit Rates → Proportions → Percent Concept → Converting Between Fractions, Decimals, and Percents → Operations with Rational Numbers → Two-Step Equations → Solving Multi-Step Equations → Equations with Variables on Both Sides → Angle Pairs: Complementary, Supplementary, and Vertical → Parallel Lines and Transversals → Corresponding Angles → Alternate Interior Angles → Triangle Angle Sum Theorem → Exterior Angle Theorem → Triangle Inequality Theorem → Similar Triangles: AA Similarity → Similar Triangles: SSS and SAS Similarity → Proportions in Similar Triangles → Right Triangle Trigonometry Introduction → Sine, Cosine, and Tangent Ratios → Trigonometric Ratios Review → Radian Measure → Converting Between Degrees and Radians → The Unit Circle → Graphing Sine and Cosine → Graphing Tangent and Reciprocal Trigonometric Functions → Derivatives of Trigonometric Functions → Antiderivatives → Indefinite Integrals → Basic Integration Rules → Riemann Sums → Definite Integral Definition → Probability Density Functions and Continuous Distributions → Cumulative Distribution Functions → Continuous Random Variables → Probability Density Functions → Expected Value → Weak Law of Large Numbers → Probability Axioms and Rules → Conditional Probability → Conditional Distributions → Bivariate Normal Distribution → Normal Distribution → Standard Normal Distribution and Z-Scores → Hypothesis Testing Fundamentals → Experimental Research Design → Control and Experimental Groups → Random Assignment → Confounding Variables and Internal Validity → Blinding and Demand Characteristics → Validity in Psychological Measurement → Inferential Statistics in Psychology → Effect Size and Statistical Power → Sample Size Determination in Research Planning → Literature Review and Research Synthesis → Hypothesis Construction: Directional and Nondirectional Predictions → Operationalizing Independent and Dependent Variables → Construct Definition and Measurement Development → Construct Validity and Measurement Validity → Construct Validity and Operationalization of Psychological Constructs → Variables: Definition, Operationalization, and Measurement → Measurement Reliability: Types and Estimation

Longest path: 115 steps · 561 total prerequisite topics

Prerequisites (2)

Variables: Definition, Operationalization, and Measurementhard Correlation Coefficientsoft

Leads To (2)

Measurement Validity: Construct and Criterion Evidencesoft Psychological Test Construction and Psychometric Validationsoft