Questions: Computerized Adaptive Testing and Dynamic Assessment
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A CAT system's item bank was calibrated using only 50 responses per item rather than the recommended 300–1,000. What is the most likely consequence for the adaptive algorithm?
AThe test will consistently overestimate examinees' true ability since small calibration samples inflate discrimination parameters
BThe algorithm will default to fixed-length behavior, selecting items non-adaptively until calibration is refreshed
CBiased IRT parameter estimates will cause the algorithm to misestimate θ from early items, with each subsequent selection compounding the error
DThe test will automatically lengthen to compensate for the reduced precision of each item selection
CAT efficiency depends entirely on the accuracy of the IRT parameters stored in the item bank. If parameters are estimated from small samples, they carry substantial error — an item's true difficulty or discrimination may differ significantly from its calibrated value. When the algorithm selects items based on these wrong parameters, the θ estimate starts diverging from the examinee's true ability. Because each subsequent item selection uses the current θ estimate, errors propagate and compound rather than self-correcting. This is why large calibration samples are a non-negotiable requirement for CAT, not an efficiency concern.
Question 2 Multiple Choice
Why might a CAT administration take as many items as a fixed-length test, even though CAT is generally described as more efficient?
ACAT is only more efficient for average-ability examinees; high- and low-ability examinees always require more items
BPoorly designed stopping rules — such as requiring the standard error to drop below an overly stringent threshold — can require many more items than necessary before the test terminates
CItem exposure control forces the algorithm to use low-information items for security reasons, increasing the total items needed
DCAT is more efficient only when the ability distribution is known in advance; otherwise it defaults to fixed-length length
CAT's efficiency advantage is real but conditional. Stopping rules determine when enough information has been gathered. A rule that terminates when the standard error of θ falls below 0.20 will require far more items than one that terminates at 0.30 — and for examinees near decision boundaries (in pass/fail tests), precision requirements can demand many more items than average. A naive stopping rule (e.g., always administer exactly 20 items) ignores these dynamics. Efficient CAT design requires matching the stopping rule to the precision needs of the testing purpose.
Question 3 True / False
CAT usually produces shorter tests than fixed-length tests measuring the same construct with the same precision.
TTrue
FFalse
Answer: False
CAT typically achieves the same precision as a fixed-length test with 50–60% of the items — but only under good conditions: a well-calibrated item bank, appropriate stopping rules, and sufficient item diversity. Poor stopping rules can require more items than necessary; a small or poorly calibrated item bank limits the algorithm's options. In practice, CAT produces shorter tests than fixed-length tests only when designed and maintained carefully. The efficiency is conditional, not guaranteed.
Question 4 True / False
In a CAT system, a correct response to a difficult item provides more information about a high-ability examinee than the same correct response provides about a low-ability examinee.
TTrue
FFalse
Answer: True
This follows directly from item response theory. Each item's information function peaks at the ability level where the item is most discriminating — typically near the item's difficulty parameter. A difficult item has near-zero information for a low-ability examinee because they would almost certainly get it wrong regardless of small θ differences. For a high-ability examinee whose θ is near the item's difficulty, a correct response substantially narrows the ability estimate. This is why CAT routes hard items to examinees with high current θ estimates: those items carry maximal information there.
Question 5 Short Answer
Why does overexposure of high-discrimination items in a CAT system threaten test validity, and how does item exposure control address this?
Think about your answer, then reveal below.
Model answer: High-discrimination items are the algorithm's first choice for nearly every examinee because they provide maximum Fisher information across a wide ability range. Without constraints, a small subset of items would be selected repeatedly while most of the bank sits unused. Overexposed items become known to test-takers through item-sharing networks, allowing coached candidates to answer correctly regardless of true ability — inflating scores and destroying the validity of the measurement. Exposure control algorithms (like Sympson-Hetter) cap the probability that any item is selected at each administration, forcing the algorithm to use a broader range of items. This trades a small reduction in optimal efficiency for security and long-term validity.
Test security and measurement efficiency are in fundamental tension in CAT. The most efficient algorithm exploits the highest-information items every time; a secure algorithm distributes usage across the bank. Exposure control formalizes this tradeoff. The result is that real-world CAT systems are never operating at theoretical maximum efficiency — they are optimizing the combination of efficiency and item security within operational constraints.