A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Score Interpretation and Validity Evidence Design

Research Depth 106 in the knowledge graph ☐ I know this ☆ Set as goal

10topics build on this

558prerequisites beneath it

Modern Validity Frameworks and Integrated Evidence→→Consequential Validity and the Social Consequences of Testing Norm-Referenced and Criterion-Referenced Score Interpretation

Core Idea

Validity is not a test property but a quality of inferences drawn from scores in a specific context. Validity evidence comes from five sources: content, response processes, internal structure, relations to external variables, and consequences. Effective interpretation requires designing validation studies that gather evidence relevant to intended uses and interpretations.

Explainer

From your work on validity evidence frameworks, you know the conceptual pivot that the *Standards for Educational and Psychological Testing* (1999/2014) introduced: validity is not a fixed property of a test, but a judgment about the appropriateness of specific inferences drawn from test scores in specific contexts for specific purposes. A test of reading comprehension may yield valid inferences about reading ability while yielding invalid inferences when used to make employment decisions in a job that does not require reading. The test did not change; the inference changed. This reframing dissolves the older tripartite distinction (content validity, criterion validity, construct validity) and replaces it with a unified concept: an argument that evidence supports, or fails to support, a score interpretation.

The five sources of validity evidence define the terrain of that argument. Content evidence asks whether the test items adequately represent the domain of interest — established through expert review, content mapping, and alignment studies. Response process evidence asks whether examinees are actually doing what the test intends — established through think-aloud protocols, eye-tracking, or cognitive interviewing. A math test may be measuring reading ability instead of mathematical reasoning if the items are verbally dense; response process data can reveal this. Internal structure evidence asks whether the item relationships within the test match the hypothesized structure of the construct — established through factor analysis and IRT model fit. Relations to external variables evidence asks whether scores correlate with other measures as theory predicts — convergent correlations with measures of the same construct, discriminant correlations with measures of different constructs. Consequential evidence asks whether the use of test scores produces intended outcomes and whether unintended consequences exist.

Designing a validation study means deciding which sources of evidence are most relevant to the intended interpretation and then building a research program to gather them. Not all five sources need equal attention for every test: a straightforward knowledge assessment for a licensure exam may require principally content evidence and criterion evidence (can licensed practitioners actually do the job?), while a novel measure of an abstract psychological construct like "grit" requires heavy investment in internal structure and discriminant validity research. The interpretive argument framework (Kane, 2006) makes this structure explicit: the test developer states the chain of inferences from observed score to ultimate decision, then identifies each inference as a link, and specifies what evidence would strengthen or break each link.

The most common failure mode in test development is gathering validity evidence *after* widespread deployment, when negative findings are costly to act on. Best practice is to design the validation program before the test is used operationally: pilot data should inform both item refinement and the evidentiary argument simultaneously. If the intended interpretation is that high scorers are more qualified for a clinical position, then criterion-related studies should be designed with the hiring outcome in mind — not added retroactively when someone questions the test's use. Validity is an ongoing process of accumulation, not a one-time certification, and each new population, context, or decision changes the evidentiary requirements.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Dividing Integers → Unit Rates → Proportions → Percent Concept → Converting Between Fractions, Decimals, and Percents → Operations with Rational Numbers → Two-Step Equations → Solving Multi-Step Equations → Equations with Variables on Both Sides → Angle Pairs: Complementary, Supplementary, and Vertical → Parallel Lines and Transversals → Corresponding Angles → Alternate Interior Angles → Triangle Angle Sum Theorem → Exterior Angle Theorem → Triangle Inequality Theorem → Similar Triangles: AA Similarity → Similar Triangles: SSS and SAS Similarity → Proportions in Similar Triangles → Right Triangle Trigonometry Introduction → Sine, Cosine, and Tangent Ratios → Trigonometric Ratios Review → Radian Measure → Converting Between Degrees and Radians → The Unit Circle → Graphing Sine and Cosine → Graphing Tangent and Reciprocal Trigonometric Functions → Derivatives of Trigonometric Functions → Antiderivatives → Indefinite Integrals → Basic Integration Rules → Riemann Sums → Definite Integral Definition → Probability Density Functions and Continuous Distributions → Cumulative Distribution Functions → Continuous Random Variables → Probability Density Functions → Expected Value → Weak Law of Large Numbers → Probability Axioms and Rules → Conditional Probability → Conditional Distributions → Bivariate Normal Distribution → Normal Distribution → Standard Normal Distribution and Z-Scores → Hypothesis Testing Fundamentals → Experimental Research Design → Control and Experimental Groups → Random Assignment → Confounding Variables and Internal Validity → Blinding and Demand Characteristics → Validity in Psychological Measurement → Construct Validity and Convergent-Discriminant Evidence → Modern Validity Frameworks and Integrated Evidence → Score Interpretation and Validity Evidence Design

Longest path: 107 steps · 558 total prerequisite topics

Prerequisites (1)

Modern Validity Frameworks and Integrated Evidencehard

Leads To (2)

Consequential Validity and the Social Consequences of Testinghard Norm-Referenced and Criterion-Referenced Score Interpretationhard