Score Interpretation and Validity Evidence Design

Research Depth 78 in the knowledge graph I know this Set as goal
Unlocks 7 downstream topics
validity score-interpretation evidence

Core Idea

Validity is not a test property but a quality of inferences drawn from scores in a specific context. Validity evidence comes from five sources: content, response processes, internal structure, relations to external variables, and consequences. Effective interpretation requires designing validation studies that gather evidence relevant to intended uses and interpretations.

Explainer

From your work on validity evidence frameworks, you know the conceptual pivot that the *Standards for Educational and Psychological Testing* (1999/2014) introduced: validity is not a fixed property of a test, but a judgment about the appropriateness of specific inferences drawn from test scores in specific contexts for specific purposes. A test of reading comprehension may yield valid inferences about reading ability while yielding invalid inferences when used to make employment decisions in a job that does not require reading. The test did not change; the inference changed. This reframing dissolves the older tripartite distinction (content validity, criterion validity, construct validity) and replaces it with a unified concept: an argument that evidence supports, or fails to support, a score interpretation.

The five sources of validity evidence define the terrain of that argument. Content evidence asks whether the test items adequately represent the domain of interest — established through expert review, content mapping, and alignment studies. Response process evidence asks whether examinees are actually doing what the test intends — established through think-aloud protocols, eye-tracking, or cognitive interviewing. A math test may be measuring reading ability instead of mathematical reasoning if the items are verbally dense; response process data can reveal this. Internal structure evidence asks whether the item relationships within the test match the hypothesized structure of the construct — established through factor analysis and IRT model fit. Relations to external variables evidence asks whether scores correlate with other measures as theory predicts — convergent correlations with measures of the same construct, discriminant correlations with measures of different constructs. Consequential evidence asks whether the use of test scores produces intended outcomes and whether unintended consequences exist.

Designing a validation study means deciding which sources of evidence are most relevant to the intended interpretation and then building a research program to gather them. Not all five sources need equal attention for every test: a straightforward knowledge assessment for a licensure exam may require principally content evidence and criterion evidence (can licensed practitioners actually do the job?), while a novel measure of an abstract psychological construct like "grit" requires heavy investment in internal structure and discriminant validity research. The interpretive argument framework (Kane, 2006) makes this structure explicit: the test developer states the chain of inferences from observed score to ultimate decision, then identifies each inference as a link, and specifies what evidence would strengthen or break each link.

The most common failure mode in test development is gathering validity evidence *after* widespread deployment, when negative findings are costly to act on. Best practice is to design the validation program before the test is used operationally: pilot data should inform both item refinement and the evidentiary argument simultaneously. If the intended interpretation is that high scorers are more qualified for a clinical position, then criterion-related studies should be designed with the hiring outcome in mind — not added retroactively when someone questions the test's use. Validity is an ongoing process of accumulation, not a one-time certification, and each new population, context, or decision changes the evidentiary requirements.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesAngle Pairs: Complementary, Supplementary, and VerticalParallel Lines and TransversalsCorresponding AnglesAlternate Interior AnglesTriangle Angle Sum TheoremExterior Angle TheoremTriangle Inequality TheoremSimilar Triangles: AA SimilaritySimilar Triangles: SSS and SAS SimilarityProportions in Similar TrianglesRight Triangle Trigonometry IntroductionTrigonometric Ratios ReviewRadian MeasureConverting Between Degrees and RadiansThe Unit CircleGraphing Sine and CosineGraphing Tangent and Reciprocal Trigonometric FunctionsDerivatives of Trigonometric FunctionsAntiderivativesIndefinite IntegralsBasic Integration RulesRiemann SumsDefinite Integral DefinitionProbability Density Functions and Continuous DistributionsCumulative Distribution FunctionsContinuous Random VariablesNormal DistributionClassical Test Theory FoundationsReliability and Validity: Foundational RelationshipConstruct Validity and Convergent-Discriminant EvidenceModern Validity Frameworks and Integrated EvidenceScore Interpretation and Validity Evidence Design

Longest path: 79 steps · 416 total prerequisite topics

Prerequisites (1)

Leads To (2)