Form A and Form B measure the same underlying ability with identical average difficulty. However, Form A's items have lower variance in error scores — it is slightly more precise. Which relationship best describes these forms?
AStrictly parallel — both forms measure the same true construct
BTau-equivalent — true scores are identical but error variances differ
CEssentially tau-equivalent — true scores differ by an additive constant
DUnequal forms — they cannot be compared statistically
Tau-equivalence requires that (1) true scores are identical across forms for all examinees, and (2) error variances are allowed to differ. Form A being slightly more precise means its error variance is smaller, violating the strict parallelism requirement of equal error variances. The forms still measure exactly the same construct with equal difficulty, so essential tau-equivalence (which allows a constant offset in true scores) is too loose — tau-equivalence is the correct classification.
Question 2 Multiple Choice
A test developer wants to compute Cronbach's alpha to estimate internal consistency reliability. Which measurement model does alpha technically assume?
AStrict parallelism — all items must have equal true scores and equal error variances
BEssential tau-equivalence — items may differ by a constant in true score but share a common latent trait
CItem response theory — each item has its own discrimination and difficulty parameter
DClassical parallel forms — alternate-form reliability must be confirmed first
Cronbach's alpha is derived under the assumption of essential tau-equivalence: items may vary in difficulty (additive constants in true scores) but must all be measuring the same underlying construct. If items are not essentially tau-equivalent — for instance, if some items measure a different dimension — alpha will underestimate or misrepresent reliability. Understanding this assumption clarifies what alpha does and does not guarantee: it estimates internal consistency reliability only when the essential tau-equivalence assumption is approximately met.
Question 3 True / False
Strictly parallel test forms are routinely achieved in large-scale standardized testing programs.
TTrue
FFalse
Answer: False
Strict parallelism requires that for every examinee, true scores on both forms are identical AND error variances are identical. In practice, even carefully constructed alternate forms differ in item wording, content sampling, and item-level precision. Achieving identical error variances across forms is essentially impossible with real items. Tau-equivalence and essential tau-equivalence are the realistic standards that test developers aim for, and the choice of which equating procedures to apply depends on which assumption is defensible.
Question 4 True / False
Under tau-equivalence, two test forms will rank all examinees in the same order.
TTrue
FFalse
Answer: True
Tau-equivalence requires that every examinee's true score on Form A equals their true score on Form B. Since true scores determine the underlying ordering, both forms rank examinees identically. Error adds random variation around these true scores on any given administration, but the rank-ordering is determined by true scores, which are identical under tau-equivalence. This is one key practical implication: tau-equivalent forms are interchangeable for the purpose of rank-ordering examinees even if one form is slightly noisier than the other.
Question 5 Short Answer
Why does the distinction between strictly parallel and tau-equivalent forms matter for test equating, and what goes wrong if the required assumptions are violated?
Think about your answer, then reveal below.
Model answer: Test equating statistically adjusts scores from different forms onto a common scale so that a score of 70 on Form A means the same as a 70 on Form B. Equating is only valid when the forms measure the same construct — at minimum, they must be essentially tau-equivalent. If this assumption fails (the forms measure different abilities), equating produces scores that appear comparable but are not, because the underlying constructs differ. High-stakes decisions (admissions, certification) based on equated scores would then be unfair to examinees who happened to receive the harder or different-construct form.
The measurement model assumptions aren't just theoretical bookkeeping — they determine which statistical procedures are valid. Using equating procedures that assume tau-equivalence on forms that violate the assumption introduces systematic bias in score comparisons. This is why construct validity evidence is a prerequisite for any equating program.