Questions: Norm-Referenced and Criterion-Referenced Score Interpretation
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A certification exam for clinical nurses has a 94% pass rate. A psychometrician argues the test is 'too easy' and recommends adding harder items to increase score spread. What assumption is driving this recommendation — and why might it be wrong?
AThe psychometrician assumes the test is unreliable, and reliability requires substantial variance in scores
BThe psychometrician is applying a norm-referenced logic — where spread is necessary for ranking — to a test whose purpose is criterion-referenced competency assessment, where a high pass rate indicates effective training, not a flawed instrument
CThe psychometrician assumes the cut score should be raised to reduce test-taker confidence
DThe psychometrician assumes the test lacks content validity because hard items are missing
If the exam's purpose is to determine whether nurses meet a competency standard, a 94% pass rate is good news — it means 94% of nurses have achieved the required skill. Adding harder items to increase spread would serve a norm-referenced goal (ranking nurses against each other) but would undermine a criterion-referenced goal (verifying competence). The choice between frameworks must be driven by the decision the score is meant to inform. Applying norm-referenced logic to a criterion-referenced instrument is a fundamental category error.
Question 2 Multiple Choice
A norm-referenced test developer removes an item because 97% of examinees answer it correctly. A criterion-referenced test developer retains the same item. Who is right, and why?
AThe norm-referenced developer — an item with near-universal correct responses has poor reliability and should always be removed
BThe criterion-referenced developer — the item may map directly onto a critical competency that all trained individuals should master, so its universal correctness is expected and appropriate
CBoth are wrong — item difficulty should be set at 50% correct to maximize information
DNeither — item retention should be decided by factor analysis, not pass rates
An item everyone gets right contributes zero discrimination (it cannot separate higher from lower scorers) and adds nothing to a norm-referenced instrument. But for a criterion-referenced instrument, if the item represents a competency that all trained people should have — say, washing hands before an invasive procedure — then 97% correct is the expected and desired outcome. Removing it would leave a gap in competency coverage. The same statistical fact (near-universal correctness) has opposite implications depending on the interpretive framework.
Question 3 True / False
Criterion-referenced score interpretation is more objective than norm-referenced interpretation because it uses fixed percentage cutoffs rather than relative rankings.
TTrue
FFalse
Answer: False
This is a common misconception. Setting a criterion — deciding what score constitutes 'competent' — requires expert professional judgment, not just counting correct answers. Standard-setting methods (Angoff, Bookmark, etc.) involve panels of experts making subjective judgments about what a minimally competent person should be able to do. The 'objectivity' of a 70% pass rate is illusory: someone had to decide that 70% and not 65% or 75% represents competence. Both norm-referenced and criterion-referenced approaches require judgment; they just apply it differently.
Question 4 True / False
A single test can support both norm-referenced and criterion-referenced score interpretations simultaneously if it is designed carefully with both purposes in mind.
TTrue
FFalse
Answer: True
Both interpretations can be applied to the same test. A licensing exam might report percentile ranks (norm-referenced) for informational purposes while also applying a pass/fail cutoff (criterion-referenced) as the actual decision. However, designing for both purposes creates tension in item selection: norm-referenced design favors items with intermediate difficulty that discriminate between people, while criterion-referenced design requires items that cover competency domains regardless of difficulty. Tests optimized for one purpose are often suboptimal for the other.
Question 5 Short Answer
Why does the choice between norm-referenced and criterion-referenced interpretation change how test items are selected and written?
Think about your answer, then reveal below.
Model answer: Norm-referenced tests need items that spread scores across individuals — so items with intermediate difficulty (near 50% correct) are preferred because they maximize variance and discriminate between higher and lower scorers. An item everyone passes adds nothing. Criterion-referenced tests need items that map onto the competency domain; even if all trained people answer an item correctly, it belongs in the test if it represents a critical skill. The goal shifts from 'does this item separate people?' to 'does this item represent the competency we need to verify?'
The logical consequence flows directly from the frameworks' different questions. Norm-referenced asks 'who performs better than whom?' — which requires score variability. Criterion-referenced asks 'has this person acquired this specific competency?' — which requires domain coverage. A test designed purely for norm-referenced purposes may systematically exclude items that everyone knows (and thus most need to be certified), while including items that discriminate but are peripheral to competency.