Classical test theory assigns a single difficulty index (p-value) to each item. What is the key limitation of this approach compared to the IRT item characteristic curve?
Think about your answer, then reveal below.
Model answer: Classical difficulty (p-value) depends on the sample — an item looks easy in a high-ability group and hard in a low-ability group. The IRT ICC describes difficulty as a point on the ability scale that is independent of the sample tested, making it a more stable and generalizable property of the item.
This is the central advantage of IRT over CTT for item analysis. CTT item statistics are group-dependent: the same item administered to Harvard students versus a general population will have very different p-values, even though the item itself has not changed. The IRT b parameter, estimated using the logistic model, places difficulty on the latent ability scale where it remains constant (assuming the model fits), enabling fair comparisons across test forms and populations.