Questions — Classical and IRT-Based Item Analysis Compared

Question 1 Multiple Choice

A test item has a p-value of 0.85 when administered to a sample of college graduates. The same item has a p-value of 0.52 when given to a sample of high school students. The best interpretation is:

AThe item was scored incorrectly for one of the groups

BThis demonstrates that p-value is a sample-dependent statistic, not a fixed property of the item

CThe item discriminates poorly because its difficulty appears to change across groups

DThe high school students received a flawed test administration

Question 2 Multiple Choice

A testing company needs to build an item bank and compare scores across different test forms administered to different cohorts each year. Which measurement approach is most appropriate?

AClassical test theory, because p-values and point-biserials are simpler to compute and interpret

BIRT, because item parameter estimates are theoretically invariant across populations, enabling score equating across different forms and cohorts

CClassical test theory, because point-biserial correlations capture the same information as IRT discrimination parameters

DEither approach works equally well for equating scores across test forms

Question 3 True / False

A CTT p-value of 0.80 indicates that the item has moderate difficulty, regardless of which population is tested.

TTrue

FFalse

Question 4 True / False

IRT item parameter estimates allow test developers to place items from different test forms onto a common scale and compare their properties, even if those forms were administered to different groups.

TTrue

FFalse

Question 5 Short Answer

What is the fundamental limitation of classical test theory item statistics, and how does IRT address it?

Think about your answer, then reveal below.

Questions: Classical and IRT-Based Item Analysis Compared