Ability (theta) is estimated from response patterns using maximum likelihood (MLE), expected a posteriori (EAP), or weighted likelihood (WLE). MLE is efficient but undefined for perfect scores; EAP is more stable with prior information; WLE compromises. Estimates are on logit scale and transformed for interpretation. Confidence intervals around theta are narrower at optimal discrimination ability levels.
From IRT assumptions, you know that theta (θ) is a latent variable representing a person's true ability, and that item response probabilities are linked to theta via an item characteristic curve. The ICC tells you: given a person at ability level θ, what is the probability they answer item *i* correctly? But this relationship runs the other direction in practice — you observe a response pattern and need to work backward to estimate where on the theta scale the person sits. That inverse problem is what ability estimation methods solve.
The most intuitive method is maximum likelihood estimation (MLE). You have an observed response vector — correct on items 1, 3, and 5; incorrect on 2 and 4. Each item has a known ICC. For any candidate theta value, you can compute the joint probability of observing exactly that response pattern (multiplying probabilities across items, since local independence is an IRT assumption you've already covered). The MLE simply finds the theta value that maximizes this joint probability. Geometrically, you're finding the peak of a likelihood curve over theta. The mathematics are the same MLE logic you've seen in other estimation contexts — find the parameter value that makes the data most probable. The problem is boundary behavior: when a person answers all items correctly, the likelihood function keeps rising as theta increases with no maximum. MLE is undefined at the extremes, which is practically inconvenient for scoring.
Expected a posteriori (EAP) estimation addresses this with a Bayesian move: multiply the likelihood by a prior distribution over theta (typically a standard normal reflecting the population) before finding the expected value. This shrinks estimates toward the center of the distribution, producing a finite estimate even for perfect or zero scores. The cost is bias — truly extreme examinees get pulled toward the mean. EAP is computationally convenient and widely used in adaptive testing and educational assessment software, but researchers should recognize that the prior's assumptions are built into every estimate. Weighted likelihood estimation (WLE) takes a third path: it corrects a known statistical bias in raw MLE (which slightly overestimates ability in the middle of the scale) without importing a distributional prior. WLE handles boundary cases better than pure MLE and avoids the shrinkage bias of EAP, making it a useful default for operational testing where examinees at the extremes are common.
All three methods produce estimates on the logit scale, which is unbounded and centered at 0 by convention. Most ability estimates fall between −3 and +3. Critically, the precision of any estimate — its standard error — is not constant across the scale. Precision is highest where item information is concentrated (near item difficulties that match theta) and lowest at the extremes where few items are well-targeted. This theta-dependent precision is what classical test theory's single reliability coefficient cannot capture: two people scoring at different points on the scale have genuinely different measurement precision, even if they took the same test. That connection between estimation precision and item information is formalized in the item information function, which the next topic addresses directly.