Why can't person-fit analysis alone identify the cause of aberrant response patterns, and what additional evidence would be needed to distinguish among the main explanations?
Think about your answer, then reveal below.
Model answer: Person-fit analysis detects statistical inconsistency — it compares observed responses to what a unidimensional IRT model predicts for the estimated theta level. But a statistically unusual pattern is consistent with multiple causes: cheating produces high scores on hard items (reverse-Guttman), carelessness produces near-random patterns, item misunderstanding produces localized failures on a specific topic, and genuine multidimensionality produces a pattern where some clusters of items fit the model while others do not. The Lz statistic cannot distinguish these because it summarizes the whole pattern in a single number. Additional evidence might include: timing data (very fast responses suggest guessing or item preknowledge), response process data, item-level analysis to see which items are driving the aberrance, and contextual information such as proctor reports or prior item exposure.
This limitation is a fundamental feature of statistical fit indices: they measure the gap between observations and model predictions, but the model does not encode all possible causes of that gap. This is why person-fit analysis is best understood as a screening tool that identifies examinees whose scores warrant further review, not a diagnostic tool that identifies the reason for aberrance. Test security investigations and score validity challenges typically require converging evidence from multiple sources.