Questions: Score Interpretation and Validity Evidence Design
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A reading comprehension test has been extensively validated for selecting employees in roles requiring heavy reading. A company now wants to use the same test to screen candidates for a physical security role with no reading requirements. Which statement best describes the validity situation?
AThe test remains valid because validity is established once and transfers across uses
BThe test may have been valid for the original purpose, but the new use requires new validity evidence — validity attaches to specific inferences in specific contexts, not to tests themselves
CThe test is now invalid because validity is a fixed property that is destroyed when you change the context
DValidity only applies to psychological constructs, not to employment screening instruments
The conceptual pivot in modern validity theory is that validity is not a property of a test but of inferences drawn from scores for specific purposes in specific contexts. A test that supports valid inferences about reading ability does not automatically support valid inferences about physical security performance — those are different claims about different relationships. The test did not change; the inference changed. New evidence is required for the new use.
Question 2 Multiple Choice
A researcher develops a test of mathematical reasoning. Response process studies reveal that low-scoring students consistently struggle with the verbal complexity of the word problems, not with the underlying mathematics. What validity threat is this?
AContent validity threat — the items do not adequately represent the domain of mathematics
BConsequential validity threat — the test is producing harmful outcomes for students
CResponse process threat — examinees are engaging with a different construct (reading/verbal comprehension) than the test intends to measure (mathematical reasoning)
DInternal structure threat — the test items do not form a unidimensional factor
Response process evidence asks whether examinees are doing what the test intends. If low scores reflect reading difficulty rather than mathematical reasoning, the test is not measuring what it claims to measure — this is a construct validity failure revealed through response process data. Think-aloud protocols, cognitive interviews, and eye-tracking are the methods for gathering this evidence. The finding doesn't merely suggest the test is 'too hard'; it suggests the scores mean something different from what the test claims.
Question 3 True / False
Validity is a property of a test itself — a well-constructed test is valid regardless of how its scores are interpreted or for what purpose it is used.
TTrue
FFalse
Answer: False
This is the key misconception the modern validity framework was designed to correct. Validity is always about a specific inference: 'these scores support the conclusion that...' The same test can yield valid inferences for one purpose (reading comprehension scores predict reading performance) and invalid inferences for another (those scores predict physical security performance). Calling a test 'valid' without specifying the inference and context is incomplete.
Question 4 True / False
Response process evidence for validity can reveal whether examinees are actually engaging with the construct a test intends to measure, rather than solving items through unintended strategies.
TTrue
FFalse
Answer: True
Response process evidence is gathered through methods like think-aloud protocols, cognitive interviewing, and eye-tracking. It answers: 'When test-takers respond to these items, are they actually doing the cognitive or behavioral process we're trying to assess?' If students are skipping steps and guessing based on keyword matching, or using test-taking tricks rather than applying knowledge, the scores may not reflect the construct — even if the content looks right on paper.
Question 5 Short Answer
Why is it a problem to gather validity evidence after a test has already been deployed widely, rather than designing validation studies before operational use?
Think about your answer, then reveal below.
Model answer: Once a test is widely deployed, negative validity findings become very costly to act on: withdrawing or revising the test requires revisiting decisions already made for large numbers of people (hiring, admission, licensure), and institutional and political pressures make it difficult to respond appropriately. Pre-deployment validation lets problems be caught and corrected when the stakes are low and changes are feasible. The interpretive argument framework (Kane, 2006) supports this by requiring the validation plan to be designed alongside the test, not appended after the fact.
This is why the Standards recommend that validation is an ongoing process beginning before operational deployment. The goal is for validation evidence to be in place when the test is first used consequentially — not accumulated retroactively in response to criticism. The five sources of validity evidence are most useful as a design framework for the validation program, not as a post-hoc checklist.