Measurement Reliability: Types and Estimation

College Depth 89 in the knowledge graph I know this Set as goal
Unlocks 6 downstream topics
reliability consistency measurement-error

Core Idea

Reliability is consistency of measurement across items (internal consistency), raters (inter-rater), time (test-retest), or forms (parallel). Each type addresses different sources of error. Coefficient alpha, intraclass correlations, and test-retest correlations quantify reliability. Unreliable measurement attenuates relationships and reduces statistical power; reliability sets an upper bound on validity.

How It's Best Learned

Calculate Cronbach's alpha for a published scale. Review reliability coefficients in research papers and interpret their magnitude. Discuss which type of reliability (internal, test-retest, inter-rater) is most important for different measurement contexts.

Common Misconceptions

Explainer

From your work on operational measurement, you know that every construct must be defined in terms of observable indicators — the behaviors, responses, or outcomes that stand in for the underlying theoretical variable. The moment you operationalize, you introduce the possibility of measurement error: the gap between your observed score and the true score you are trying to capture. Reliability is the study of that gap — specifically, how consistent the observed score is across different conditions under which you would expect it to stay the same.

The most important conceptual anchor is Classical Test Theory's decomposition: Observed Score = True Score + Error. If you administer the same test to the same person twice under identical conditions, the true score should be the same both times. Any difference in observed score is error. Reliability is the proportion of variance in observed scores that reflects true score variance — formally, σ²_T / σ²_X. A reliability of 0.80 means 80% of the observed score variance is true variance and 20% is error. Different types of reliability target different sources of error.

Internal consistency (measured by Cronbach's alpha) asks: do the items on this scale all pull in the same direction? It targets error from sampling items — if you replaced half the items with other items measuring the same construct, would the scores stay the same? Alpha is computed from the average inter-item correlation and the number of items: longer scales with higher inter-item correlations yield higher alpha. The connection to your knowledge of correlations is direct — alpha is essentially a function of the average pairwise item correlation. The target of α > 0.70 is a rough heuristic; for high-stakes clinical decisions, you want α > 0.90 because lower reliability means individual scores could be far off. Test-retest reliability asks about stability over time — error from temporal inconsistency in measurement. Inter-rater reliability asks whether two independent judges produce the same score — error from observer subjectivity.

The most critical practical implication is that reliability sets a ceiling on validity. If a scale measures with error, the correlation between that scale and any external criterion is mathematically attenuated — reduced toward zero by the noise in the scores. The correction for attenuation formula makes this explicit: the maximum possible correlation between two measures equals the square root of the product of their reliabilities. A scale with alpha = 0.60 can correlate at most about 0.77 with a perfectly reliable criterion. Before asking "does this measure predict what it should predict?", you must ask "is this measure consistent enough that it could even detect a real relationship?" Unreliable measurement is not just imprecise — it systematically undermines the scientific conclusions you can draw.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesAngle Pairs: Complementary, Supplementary, and VerticalParallel Lines and TransversalsCorresponding AnglesAlternate Interior AnglesTriangle Angle Sum TheoremExterior Angle TheoremTriangle Inequality TheoremSimilar Triangles: AA SimilaritySimilar Triangles: SSS and SAS SimilarityProportions in Similar TrianglesRight Triangle Trigonometry IntroductionTrigonometric Ratios ReviewRadian MeasureConverting Between Degrees and RadiansThe Unit CircleGraphing Sine and CosineGraphing Tangent and Reciprocal Trigonometric FunctionsDerivatives of Trigonometric FunctionsAntiderivativesIndefinite IntegralsBasic Integration RulesRiemann SumsDefinite Integral DefinitionProbability Density Functions and Continuous DistributionsCumulative Distribution FunctionsContinuous Random VariablesNormal DistributionCentral Limit TheoremConfidence Intervals for MeansZ-Tests and T-Tests for MeansOne-Sample Z-Test for MeansOne-Sample and Two-Sample T-TestsInferential Statistics in PsychologyEffect Size and Statistical PowerSample Size Determination in Research PlanningLiterature Review and Research SynthesisHypothesis Construction: Directional and Nondirectional PredictionsOperationalizing Independent and Dependent VariablesConstruct Definition and Measurement DevelopmentConstruct Validity and Measurement ValidityConstruct Validity and Operationalization of Psychological ConstructsVariables: Definition, Operationalization, and MeasurementMeasurement Reliability: Types and Estimation

Longest path: 90 steps · 427 total prerequisite topics

Prerequisites (2)

Leads To (2)