Measurement, Validity, and Reliability

College Depth 0 in the knowledge graph I know this Set as goal
Unlocks 15 downstream topics
measurement validity reliability construct operationalization

Core Idea

Distinguishes construct validity, internal validity, external validity, and measurement reliability. Examines sources of measurement error, systematic bias, and how validity threats vary across designs. Introduces strategies for validating constructs and assessing reliability in social measurement.

How It's Best Learned

Evaluate existing instruments for validity evidence, design validity checks into your own research, practice identifying validity threats in design scenarios.

Common Misconceptions

Explainer

Measurement is where abstract concepts meet concrete data — and the gap between them is where most research goes wrong. In the social sciences, you rarely measure what you care about directly. You can't observe "intelligence," "social trust," or "political polarization" the way you can measure temperature or mass. Instead, you observe indicators: survey responses, behavioral counts, test scores, administrative records. The question of whether your indicators actually capture the concept you intend is the question of validity. This is one of the most consequential methodological issues in social science, because a finding that is technically correct but invalid — measuring the wrong thing precisely — is worse than useless.

The validity landscape has several distinct layers, and it's essential to keep them separate. Construct validity asks whether your measurement instrument captures the theoretical concept you intend. If you measure "anxiety" with a scale that actually tracks general negative affect, your construct validity is compromised — you're studying the wrong thing. Internal validity is about causal inference: can you attribute the relationship you found to the cause you claim, or could it be a confound? This is primarily a design question (randomization, control groups, time ordering) rather than a measurement question per se. External validity asks whether findings generalize beyond your sample and context — does what you found in a lab study of U.S. undergraduates tell you anything about adults in general?

Reliability is a necessary but not sufficient condition for validity. A reliable measure produces consistent results across repeated applications — the same respondent gets the same score on different days, or different raters agree on how to score the same observation. High reliability means your measure is precise. But precision and accuracy are different things: a scale that consistently reads 10 pounds too heavy is perfectly reliable and perfectly invalid. The standard forms of reliability — test-retest reliability (consistency over time), inter-rater reliability (consistency across coders), and internal consistency (items within a scale correlating with each other) — each capture a different dimension of measurement consistency.

Validating a construct is a cumulative, evidence-based process, not a single test. Content validity asks whether the items in your instrument cover the full conceptual domain (not just one corner of it). Convergent validity asks whether your measure correlates appropriately with other measures of the same construct. Discriminant validity asks whether it fails to correlate with measures of different constructs — if your anxiety scale correlates just as highly with a depression scale as with other anxiety scales, something is off. Predictive validity asks whether your measure predicts outcomes it theoretically should. Taken together, these forms of evidence build a cumulative case that you are measuring what you claim to measure — which is the bedrock on which everything else in social science research stands.

What did you take from this?

Topics in reflective domains aren't scored by quiz answers. Read, reflect, and mark when you've thought it through.

Quiz me anyway →

Prerequisite Chain

This is a foundational topic with no prerequisites.

Prerequisites (0)

No prerequisites — this is a starting point.

Leads To (5)