A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

IRT Model Comparison and Fit Evaluation

Research Depth 112 in the knowledge graph ☐ I know this ☆ Set as goal

756prerequisites beneath it

Rasch Model: One-Parameter Item Response Theory Three-Parameter Logistic IRT Model (3PL)+3 more→

Core Idea

Comparing IRT models requires examining fit statistics (likelihood ratio tests, AIC, BIC), item-level residuals, and practical utility. Model selection balances parsimony with empirical fit. A simpler model (Rasch) may be preferred even if more complex models (2PL, 3PL) fit better, depending on measurement goals and resources.

Explainer

You have now studied three IRT model families: the Rasch (1PL) model, the 2PL, and the 3PL. Each adds one more parameter to account for more item-level variation—the 2PL adds discrimination (how steeply the item distinguishes low from high ability), and the 3PL adds a pseudo-guessing parameter (the probability that a very low-ability examinee gets the item right by chance). The natural question is: which model should you use? The answer requires balancing two competing pressures that should already be familiar from your study of probability and statistical inference—fit and parsimony.

The most direct statistical tool for comparing nested IRT models is the likelihood ratio test (LRT). Because the Rasch model is a constrained version of the 2PL (with all discriminations fixed to 1), and the 2PL is a constrained version of the 3PL (with all guessing parameters fixed to 0), these models are nested. The LRT compares the log-likelihoods of two models: if the more complex model fits the data significantly better (chi-square test on the difference in log-likelihoods, with degrees of freedom equal to the difference in number of estimated parameters), you have evidence that the additional parameters are justified. When you studied the chi-square test, you encountered this same logic: a significant result means the simpler model's constraints are inconsistent with the data.

However, statistical significance alone is not sufficient for model selection. With large samples—common in psychometric applications—even trivially small improvements in fit can be statistically significant. This is where information criteria become essential. The AIC (Akaike Information Criterion) penalizes model complexity as 2k − 2ln(L), where k is the number of parameters and L is the maximized likelihood. The BIC (Bayesian Information Criterion) applies a heavier penalty, 2k·ln(n) − 2ln(L), making it more conservative against overfitting in large samples. Lower values are better for both. When a 3PL model has lower AIC than the Rasch model, the gain in fit outweighs the cost of the additional parameters by the AIC's metric; the model comparison is essentially asking whether the extra parameters are "earning their keep."

Beyond global fit, item-level residuals are equally important. A model can fit overall while specific items misfit badly—individual item response functions may not match the model's predicted curves. Infit and outfit statistics flag items where observed response patterns diverge from the model's expectations, either across the full ability range (outfit) or near the item's difficulty level (infit). A model that fits globally but has many misfitting items is not trustworthy for measuring those dimensions.

The final and often decisive factor is practical utility. The Rasch model has a unique property: when its assumptions hold, person ability and item difficulty are on the same scale, enabling sample-independent item calibration—items calibrated on one sample can be used to measure a different sample without re-estimation. This property makes Rasch models especially valuable for large-scale testing programs, adaptive testing, and test equating. If the 2PL fits slightly better than Rasch by AIC but item discriminations vary only modestly, a psychometrician might prefer Rasch for its measurement properties rather than the marginal fit gain. Model selection in IRT is not a statistical algorithm—it is a judgment that weighs empirical evidence, theoretical commitments, and the uses to which the test will be put.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Dividing Integers → Unit Rates → Proportions → Percent Concept → Converting Between Fractions, Decimals, and Percents → Operations with Rational Numbers → Two-Step Equations → Solving Multi-Step Equations → Equations with Variables on Both Sides → Angle Pairs: Complementary, Supplementary, and Vertical → Parallel Lines and Transversals → Corresponding Angles → Alternate Interior Angles → Triangle Angle Sum Theorem → Exterior Angle Theorem → Triangle Inequality Theorem → Similar Triangles: AA Similarity → Similar Triangles: SSS and SAS Similarity → Proportions in Similar Triangles → Right Triangle Trigonometry Introduction → Sine, Cosine, and Tangent Ratios → Trigonometric Ratios Review → Radian Measure → Converting Between Degrees and Radians → The Unit Circle → Graphing Sine and Cosine → Graphing Tangent and Reciprocal Trigonometric Functions → Derivatives of Trigonometric Functions → Antiderivatives → Indefinite Integrals → Basic Integration Rules → Riemann Sums → Definite Integral Definition → Fundamental Theorem of Calculus Part 1 → Fundamental Theorem of Calculus Part 2 → U-Substitution → Partial Fraction Decomposition for Integration → Improper Integrals - Convergence → Integral Test → P-Series → Comparison Test → Limit Comparison Test → Series Convergence Test Strategy → Power Series → Radius and Interval of Convergence → Taylor Series → Moment Generating Functions → Characteristic Functions → Convergence in Distribution → Stationary Distributions → Convergence of Markov Chains → Convergence in Probability → Almost Sure Convergence → Relationships Between Modes of Convergence → Weak Law of Large Numbers → Strong Law of Large Numbers → Central Limit Theorem (Rigorous via Characteristic Functions) → Maximum Likelihood Estimation (Theory) → Two-Parameter Logistic IRT Model (2PL) → Three-Parameter Logistic IRT Model (3PL) → IRT Model Comparison and Fit Evaluation

Longest path: 113 steps · 756 total prerequisite topics

Prerequisites (5)

Rasch Model: One-Parameter Item Response Theoryhard Two-Parameter Logistic IRT Model (2PL)hard Three-Parameter Logistic IRT Model (3PL)hard Chi-Square Testsoft Probability Density Functionssoft

Leads To (0)

No topics depend on this one yet.