Questions: IRT Model Comparison and Fit Evaluation

5 questions to test your understanding

Score: 0 / 5
Question 1 Multiple Choice

A psychometrician tests a 50-item certification exam. The 2PL fits significantly better than the Rasch model by likelihood ratio test (p < .001), but item discrimination parameters vary narrowly (range: 0.85–1.15). The test will be used for large-scale adaptive testing across multiple years and examinee populations. The most defensible model choice is:

AAlways the 2PL — a statistically significant fit difference must be respected
BThe 3PL — if the 2PL fits better than Rasch, the 3PL likely fits even better and should be explored
CThe Rasch model — the fit improvement is trivially small, and Rasch's sample-independent calibration property is valuable for adaptive testing and equating across populations
DNeither — the narrow discrimination range means the items are too similar and should be revised before model selection
Question 2 Multiple Choice

Why are information criteria like AIC and BIC often preferred over the likelihood ratio test alone for comparing IRT models in large psychometric samples?

ABecause AIC and BIC can compare non-nested models, whereas the LRT is restricted to nested model families
BBecause in large samples the LRT almost always rejects the simpler model regardless of practical significance, while AIC and BIC penalize complexity and measure whether added parameters earn their keep
CBecause the LRT requires normality assumptions that are violated in IRT data
DBecause AIC is always lower for more complex models, making it a reliable guide to model selection
Question 3 True / False

A model can show acceptable global fit statistics while individual items within it misfit the model's predictions badly.

TTrue
FFalse
Question 4 True / False

When a likelihood ratio test shows the 3PL fits significantly better than the Rasch model, the 3PL should generally be selected for the final test.

TTrue
FFalse
Question 5 Short Answer

What is the unique measurement property of the Rasch model that makes it especially valuable for large-scale or adaptive testing, and under what conditions might this property justify choosing Rasch over a 2PL that fits the data better?

Think about your answer, then reveal below.