Questions: Rasch Model: One-Parameter Item Response Theory
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
Under the Rasch model, what makes the total raw score a 'sufficient statistic' for ability estimation?
AAll items are scored on the same scale, so they contribute equally to the total
BBecause all item characteristic curves have the same slope, the number correct contains all the information about ability — which specific items were answered correctly adds nothing
CThe raw score is sufficient because the Rasch model assumes all items are equally difficult
DRaw scores are sufficient in all IRT models, not just the Rasch model
Sufficient statistic means knowing the total raw score is enough — you don't need to know *which* items the person got right. This property holds uniquely in the Rasch (1PL) model because all ICCs have the same slope (discrimination). When discriminations differ (2PL model), getting a hard item right carries different information than getting an easy item right, so the raw score alone is no longer sufficient. This sufficiency property is what underpins Rasch's 'specific objectivity.'
Question 2 Multiple Choice
A researcher finds perfect fit of a test to the Rasch model. They conclude the test is valid and measures what it claims to measure. What is wrong with this reasoning?
ANothing — perfect model fit proves construct validity
BPerfect fit only means the items have equal discrimination; it says nothing about whether the construct being measured is the intended one
CRasch fit statistics cannot reach perfection, so the premise is impossible
DThe conclusion would be correct if sample size is large enough
Rasch model fit indicates that items behave consistently with the model's assumptions (equal discrimination, unidimensionality). It says nothing about whether the measured trait is the one you *intended* to measure. A perfectly fitting Rasch scale could measure something entirely different from the claimed construct — validity is a separate, substantive judgment requiring content analysis, criterion validity studies, and domain expertise. Fit statistics diagnose statistical behavior, not meaning.
Question 3 True / False
The Rasch model produces interval-scale ability estimates because it converts raw scores into logit units.
TTrue
FFalse
Answer: True
Raw test scores are ordinal — going from 0 to 1 correct may represent a larger ability jump than going from 9 to 10, depending on item placement. Rasch converts scores to log-odds (logit) units via the model's logistic function, which places persons and items on a common interval scale. A one-logit difference in ability means the same increase in probability of success regardless of where you are on the scale. This interval property allows arithmetic operations (means, differences, regressions) that are inappropriate for raw scores.
Question 4 True / False
Under the Rasch model, item difficulty estimates obtained from one sample are meaningless for describing those items' behavior in a different sample.
TTrue
FFalse
Answer: False
This is precisely what the Rasch model's 'specific objectivity' refutes. When data fit the model, item difficulty estimates are sample-independent — they can be calibrated on one group and applied to another (after equating the scales). This is analogous to physical measurement: a ruler calibrated in one laboratory gives the same measurement elsewhere. Under CTT or 2PL IRT, item parameters are more sample-dependent because the discrimination parameter conflates item properties with the ability range of the sample.
Question 5 Short Answer
What is 'specific objectivity' in the Rasch model and why does it make Rasch measurement resemble physical measurement more than classical test scores do?
Think about your answer, then reveal below.
Model answer: Specific objectivity means that person ability estimates do not depend on which particular items were administered, and item difficulty estimates do not depend on which particular sample was tested — as long as the data fit the model. This mirrors physical measurement: you can measure a person's weight with different calibrated scales and get the same result. Classical raw scores lack this property because they are sensitive to item selection (an easy test inflates scores) and sample characteristics (item statistics shift with sample ability). Rasch's logit scale creates a stable metric that allows comparisons across test forms and samples.
The physical measurement analogy is important: Rasch saw his model as achieving in psychology what rulers and thermometers achieve in physics — a context-independent unit of measurement. Whether this ideal is achievable with psychological constructs (which are far less clearly defined than length or temperature) is debated, but it sets the aspirational standard that distinguishes measurement from mere ordering.