Questions — Test Bias Detection Methods and Statistical Approaches

Question 1 Multiple Choice

A test developer applies the Mantel-Haenszel procedure to an item and finds no significant DIF. A measurement colleague argues the item could still be biased. Under what condition would the colleague be correct?

AIf the item has low test-retest reliability, MH produces inflated false-negative rates

BIf the DIF effect reverses direction across ability levels (non-uniform DIF), MH would not detect it

CThe colleague is wrong; a non-significant MH result establishes that the item is unbiased

DMH cannot be trusted for items in the middle difficulty range

Question 2 Multiple Choice

A research team wants to compare latent mean scores on a depression scale between British and Korean samples to determine whether one population is more depressed on average. What statistical requirement must be met for this comparison to be valid?

AThe scale must achieve Cronbach's alpha ≥ 0.80 in both samples

BThe samples must be matched on age, gender, and education

CScalar measurement invariance must hold — the same factor loadings and item intercepts across groups

DNo individual item should show significant DIF in either sample

Question 3 True / False

Non-uniform DIF is more problematic than uniform DIF because it cannot be corrected by simply adjusting total scores — the group difference changes direction or magnitude across the ability distribution.

TTrue

FFalse

Question 4 True / False

Establishing that a scale has the same factor structure (configural invariance) in two groups is sufficient to support valid comparisons of latent means across those groups.

TTrue

FFalse

Question 5 Short Answer

Why is test bias detection considered a form of validity evidence collection, rather than a separate psychometric concern?

Think about your answer, then reveal below.

Questions: Test Bias Detection Methods and Statistical Approaches