Item response functions mathematically describe the relationship between a person's underlying ability and the probability of endorsing an item correctly. Item characteristic curves visualize this relationship, showing how item difficulty and discrimination affect item performance across ability levels.
Classical test theory, which you have already studied, summarizes item performance with a single number: the p-value, the proportion of test-takers who got the item right. Simple and intuitive, but with a serious flaw — the p-value is not a property of the item. Administer the same item to a high-ability group and you get a high p-value; administer it to a low-ability group and the p-value drops. Item response theory (IRT) fixes this by modeling difficulty as a location on the ability scale rather than a proportion dependent on whoever happened to take the test.
The core idea is that each person has a latent ability θ (theta), and each item has parameters that determine how likely a person at any given θ level is to answer correctly. The item response function — also called the item characteristic curve when plotted — maps this relationship. For the simplest model (the 1PL or Rasch model), the curve has an S-shape defined by a single parameter b, the difficulty. When θ = b, the probability of a correct response is 0.50. People with ability well above b will almost certainly get the item right; people well below b will almost certainly get it wrong. The S-shaped curve (a logistic function) captures the realistic intuition that the probability increases smoothly with ability rather than jumping abruptly.
The two-parameter logistic model (2PL) adds a discrimination parameter a, which controls how steeply the S-curve rises around the difficulty point. A high-discrimination item has a steep curve: it sharply separates people just above and just below b. A low-discrimination item has a flat curve: even people far above the difficulty threshold may sometimes miss it, and people well below it may sometimes get it right. High discrimination is what you want in a test designed to spread examinees across scores — it extracts more information per item about where someone falls on the ability scale.
Understanding the ICC directly addresses a misconception carried over from CTT: that a "hard item" is simply one that most people miss. In IRT, "hard" means the item's difficulty parameter b is high on the ability scale — it requires high ability to have a 50% success rate. Whether most people in your sample miss it is a function of how that sample's abilities are distributed, not the item's intrinsic property. This distinction matters enormously when you need to equate different test forms or make comparisons across testing populations.
When you look at a set of ICCs together on one plot, you can immediately see which items are informative at which ability levels, whether the test covers the full ability range, and whether any items are so poorly discriminating that they add little measurement value. This is the payoff of the IRT framework: a rich, visually interpretable description of what each item is doing, expressed in terms that generalize beyond the sample used to estimate the parameters.