A researcher is reviewing a 5-point Likert scale and notices that the threshold parameters between categories 2 and 3 are nearly identical (both around θ = 0.1). What does this finding suggest?
AThe scale is functioning well — closely spaced thresholds indicate high precision at that trait level
BCategories 2 and 3 are functionally redundant and the scale could be collapsed without losing meaningful measurement information
CThe discrimination parameter is too low and should be increased by rewriting the item
DThe item fits the GRM but not the GPCM
Threshold parameters indicate the θ level at which adjacent categories are equally probable. When two consecutive thresholds are nearly identical, the two categories they separate are probabilistically almost the same — respondents at that θ level are equally likely to choose either category, and neither reliably distinguishes one θ from another. This means the two categories are providing essentially no differential measurement information. The appropriate response is to collapse them into a single category, effectively reducing the scale from 5 to 4 points. This is a diagnosis that classical item-total correlations cannot provide — they would show only the item's overall discrimination, not which specific categories are redundant.
Question 2 Multiple Choice
What is the fundamental structural difference between the Graded Response Model (GRM) and the Generalized Partial Credit Model (GPCM)?
AGRM applies only to personality measures; GPCM applies only to cognitive tests
BGRM uses cumulative probability functions (probability of responding at category k or higher); GPCM models adjacent-category transitions directly
CGRM allows discrimination to vary across categories; GPCM constrains all categories to share a single discrimination parameter
DGRM requires equal intervals between thresholds; GPCM allows unequal intervals
The structural distinction is in how each model defines its category boundary functions. The GRM models the probability of responding in category k *or any higher category* — cumulative probabilities — using a 2PL-like sigmoid for each boundary. The GPCM models the probability of choosing category k *relative to the adjacent category k-1* — a direct pairwise comparison at each step. In practice, GRM is common for attitude/personality scales with a firm ordered structure; GPCM is more common for partial-credit academic items where each step may represent qualitatively different cognitive work. Notably, it is GRM (not GPCM) that constrains discrimination to be constant across categories within an item.
Question 3 True / False
A polytomous IRT analysis can detect a response category that attracts both very low-θ and very high-θ respondents — a non-monotonic category response function — which classical item analysis cannot identify.
TTrue
FFalse
Answer: True
Classical item analysis computes a single item-total correlation, which summarizes overall item-trait relationship. It cannot decompose item functioning at the level of individual categories. In polytomous IRT, each category has its own category response function (CRF) showing its probability as a function of θ. If the CRF for the middle category ('Neutral') is non-monotonic — peaking at moderate θ but also elevated at extreme θ — this reveals that the category is capturing 'indecision' or non-attitude rather than a true midpoint on the trait. This directly visible pattern is invisible to classical correlation-based methods and has practical implications for scale revision.
Question 4 True / False
In a polytomous IRT model, most five response categories of a Likert scale contribute equal amounts of information at most level of the latent trait θ.
TTrue
FFalse
Answer: False
Each category response function (CRF) peaks at a different θ level — the lowest category dominates at low θ, the highest at high θ, and intermediate categories dominate in between. This means each category provides maximum information only near its 'home' region of the trait continuum. At extreme θ values (very high or very low), middle categories contribute little information because they are rarely endorsed there. The total test information function (summed across categories) peaks where the item best discriminates, which is around the central threshold parameters. Middle categories often contribute surprisingly little information overall, which is why polytomous IRT analysis can justify collapsing scales.
Question 5 Short Answer
What can polytomous IRT reveal about individual response categories that classical item-total correlation analysis cannot?
Think about your answer, then reveal below.
Model answer: Polytomous IRT provides a category response function (CRF) for each response option, showing the probability of endorsing that specific category as a function of the latent trait θ. This allows detection of: whether categories are being used monotonically (as θ increases, categories should peak in order from lowest to highest); whether adjacent categories have nearly identical thresholds (making them redundant); whether a middle category has a non-monotonic CRF (attracting both low- and high-θ respondents, suggesting it captures indecision rather than a true midpoint); and how much measurement information each category contributes across the trait range. Classical item-total correlation gives a single number per item and cannot decompose functioning to the category level.
The central advantage of polytomous IRT is that it treats the response scale as part of the measurement model rather than as a given. While CTT essentially treats Likert categories as rough interval measurements and asks only 'does this item correlate with the total score?', IRT asks 'is each category functioning as intended — is it pointing to a distinct region of the trait continuum and doing so consistently across people with the same θ?' This richer diagnostic is what allows scale developers to make evidence-based decisions about collapsing categories, rewriting poorly functioning items, or choosing between 4- and 5-point formats.