Questions: Item Difficulty and Item Discrimination Analysis
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
After scoring an exam, you find that Item 14 has a point-biserial correlation of -0.22. What does this most likely indicate?
AThe item is too easy — nearly everyone got it right, compressing variance
BThe item is too difficult — very few correct responses inflated the correlation
CThe item may be miskeyed or genuinely ambiguous — high scorers got it wrong more than low scorers
DThe item is fine — negative correlations are common for true-false items
A negative point-biserial is a red flag: it means students who scored higher on the test overall were *less* likely to get this item correct. This is the opposite of what a good item does. The most common cause is a miskeyed item — the answer key records the wrong option as correct, so knowledgeable students who know the right answer are penalized. It can also signal a genuinely ambiguous question that confused the best students. A negative discrimination almost always warrants immediate review of the key and item wording before the scores are used.
Question 2 Multiple Choice
An item has a p-value of 0.95 on a licensure examination for nurses. A test developer proposes removing it for being 'too easy.' What is the best response?
AAgree — items near p = 0.50 are always preferable because they maximize variance
BAgree — a p-value of 0.95 means the item contributes almost no information to score differentiation
CDisagree — the p-value should be evaluated in context; for a safety-critical competency, near-universal mastery is expected and appropriate
DDisagree — p-values above 0.90 are outliers caused by measurement error and should be retained
Easy items (high p-value) do minimize variance and contribute little to differentiating ability across the full range, which makes them poor choices for norm-referenced tests designed to spread examinees out. But test purpose matters: a licensure exam certifies minimum competency, and certain safety-critical tasks (e.g., identifying a medication overdose) should be known by virtually every competent nurse. A p-value of 0.95 on such an item reflects appropriate domain mastery, not a flawed item. The statistical argument for removing easy items applies most forcefully to aptitude tests, not mastery assessments.
Question 3 True / False
In classical test theory, a higher p-value for an item means the item is harder.
TTrue
FFalse
Answer: False
This is the most counterintuitive convention in classical test theory. The p-value (proportion correct) runs from 0 to 1, and a higher p-value means *more* people got the item right — meaning the item is *easier*, not harder. An item with p = 0.90 is very easy (90% correct); an item with p = 0.20 is very difficult (only 20% correct). The naming is confusing because 'p-value' in statistics usually refers to hypothesis testing, but in item analysis it simply means the proportion passing. Students who reason by analogy from statistical p-values often get this backwards.
Question 4 True / False
An item with near-zero point-biserial discrimination is contributing meaningful information about the underlying construct being measured.
TTrue
FFalse
Answer: False
Discrimination measures whether the item distinguishes high from low scorers. A point-biserial near zero means the item response is essentially uncorrelated with total score — whether a student answers correctly is unrelated to their overall ability on the test. Such items contribute statistical noise rather than signal. They inflate test length without improving reliability or validity. Items with near-zero discrimination should be reviewed for construct relevance (does this item actually measure what the test is measuring?), clarity (is the wording confusing to all ability levels equally?), and keying accuracy.
Question 5 Short Answer
Why is a negative item discrimination index a more serious problem than simply low discrimination, and what should a test developer do when encountering it?
Think about your answer, then reveal below.
Model answer: Negative discrimination means high scorers got the item wrong more often than low scorers — the item is pulling in the opposite direction from the test. This actively *harms* measurement validity: it penalizes the most knowledgeable students. Low discrimination is a neutral problem (the item is inert), but negative discrimination is an active problem. The immediate step is to check the answer key for miskeying, then review the item for ambiguity. The item should be flagged and either rescored or removed before final score reporting.
Low discrimination items waste test time but don't distort rankings; negative discrimination items distort them in the wrong direction. In high-stakes testing (admissions, licensure), leaving a negatively discriminating item in the scored set can change who passes and fails. The standard practice is to audit all items with point-biserials below 0.15 and treat negative values as emergencies requiring pre-score-reporting resolution.