Questions — IRT Model Comparison and Fit Evaluation

Question 1 Multiple Choice

A psychometrician tests a 50-item certification exam. The 2PL fits significantly better than the Rasch model by likelihood ratio test (p < .001), but item discrimination parameters vary narrowly (range: 0.85–1.15). The test will be used for large-scale adaptive testing across multiple years and examinee populations. The most defensible model choice is:

AAlways the 2PL — a statistically significant fit difference must be respected

BThe 3PL — if the 2PL fits better than Rasch, the 3PL likely fits even better and should be explored

CThe Rasch model — the fit improvement is trivially small, and Rasch's sample-independent calibration property is valuable for adaptive testing and equating across populations

DNeither — the narrow discrimination range means the items are too similar and should be revised before model selection

Question 2 Multiple Choice

Why are information criteria like AIC and BIC often preferred over the likelihood ratio test alone for comparing IRT models in large psychometric samples?

ABecause AIC and BIC can compare non-nested models, whereas the LRT is restricted to nested model families

BBecause in large samples the LRT almost always rejects the simpler model regardless of practical significance, while AIC and BIC penalize complexity and measure whether added parameters earn their keep

CBecause the LRT requires normality assumptions that are violated in IRT data

DBecause AIC is always lower for more complex models, making it a reliable guide to model selection

Question 3 True / False

A model can show acceptable global fit statistics while individual items within it misfit the model's predictions badly.

TTrue

FFalse

Question 4 True / False

When a likelihood ratio test shows the 3PL fits significantly better than the Rasch model, the 3PL should generally be selected for the final test.

TTrue

FFalse

Question 5 Short Answer

What is the unique measurement property of the Rasch model that makes it especially valuable for large-scale or adaptive testing, and under what conditions might this property justify choosing Rasch over a 2PL that fits the data better?

Think about your answer, then reveal below.

Questions: IRT Model Comparison and Fit Evaluation