Questions: Domain Sampling Theory and Generalization of Reliability
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A test developer wants to maximize reliability of a 20-item extraversion scale. She replaces 10 diverse items with near-paraphrases of the 10 highest-loading items. Coefficient alpha rises from 0.82 to 0.94. Has the test improved?
AYes — higher alpha means greater reliability and therefore a better test
BNot necessarily — the higher alpha likely reflects item redundancy, narrowing construct coverage without genuinely improving measurement
CYes — alpha above 0.90 is the accepted threshold for high-quality psychometric instruments
DNo — alpha above 0.90 always indicates overfit and requires redesign from scratch
Domain sampling theory reveals the paradox: alpha can be inflated by making items redundant rather than by measuring the construct more reliably. If all items ask the same question in slightly different words, alpha approaches 1.0 — but the test is sampling a narrow slice of the domain repeatedly. Higher alpha through redundancy shrinks construct coverage. The correct target is items spread widely across the item universe that still cohere around a single construct.
Question 2 Multiple Choice
Domain sampling theory explains why adding more items increases reliability. Which analogy best captures this logic?
AAdding more scales to a weighing room increases the total weight measured
BA larger random sample from a population gives a more accurate estimate of the population mean — more items give a better estimate of the person's true score in the item universe
CMore items reduce individual item errors because measurement errors are always independent
DAdditional items increase content validity, which causes reliability to rise as a consequence
Domain sampling theory treats items as a sample from an infinite item universe, just as a survey samples voters from an electorate. A larger sample estimates the population parameter more accurately — variance of the sample mean decreases as n increases. Similarly, more test items give a better estimate of the person's 'true score' — their mean score across the entire item universe. This is why the Spearman-Brown formula shows predictable reliability gains from lengthening a test.
Question 3 True / False
Coefficient alpha is a lower bound on reliability — the true reliability of a test is at least as high as its alpha, assuming the test measures a single construct with locally independent items.
TTrue
FFalse
Answer: True
Under the assumptions of essentially tau-equivalent items (items measuring the same construct with equal true score variances) and local independence, alpha equals reliability. When items are congeneric (slightly different factor loadings), alpha underestimates reliability. Thus alpha is a conservative lower bound: the true reliability is at least alpha, often higher. This is why alpha should be viewed as a minimum estimate, not an exact value.
Question 4 True / False
High internal consistency (alpha ≈ 0.95) guarantees that a test is measuring a broad and representative sample of the construct's item universe.
TTrue
FFalse
Answer: False
High alpha indicates items correlate strongly with each other — but strong inter-item correlation can result from narrow redundancy (all items ask the same thing differently) or from broad coverage of a coherent construct. Alpha cannot distinguish between these two causes. A test can achieve alpha = 0.95 by asking five nearly identical questions about a tiny corner of a construct, which would be a psychometric failure despite the high coefficient.
Question 5 Short Answer
Why does the domain sampling framework create a tension between maximizing internal consistency and achieving broad construct coverage?
Think about your answer, then reveal below.
Model answer: Domain sampling theory treats items as a sample from an infinite item universe. Making items more similar (higher inter-item correlations) raises alpha — but it means sampling a narrower region of the domain more densely rather than covering the full universe. The ideal test samples widely from the item universe (broad coverage) while all items still measure the same construct (coherence). Maximizing alpha through redundancy sacrifices breadth; the correct optimization target is representative sampling, not alpha maximization.
This tension is practically important in scale construction. A test designed purely to maximize alpha will tend to converge on the same few high-loading items asked repeatedly, narrowing what is actually assessed. Domain sampling theory provides the corrective: think of item selection as representative sampling from a conceptual universe, not as alpha optimization. Coefficient alpha is an index of that sampling quality, not the goal itself.