Questions — Inter-Rater Reliability and Observer Agreement

Question 1 Multiple Choice

Two clinical raters independently assess 100 patients for depression in a clinic where 95% of patients are not depressed. Both raters always code 'not depressed.' What are their percent agreement and Cohen's kappa?

APercent agreement = 95%, kappa ≈ 0

BPercent agreement = 95%, kappa ≈ 0.95

CPercent agreement = 100%, kappa = 1.0

DPercent agreement = 100%, kappa ≈ 0

Question 2 Multiple Choice

A researcher uses percent agreement to report inter-rater reliability for a coding scheme with three behavioral categories used roughly equally (≈33% each). Compared to Cohen's kappa, what is most likely true?

APercent agreement will be lower than kappa, because it ignores systematic rater bias

BPercent agreement will be higher than kappa, because kappa subtracts the expected chance agreement

CPercent agreement and kappa will be equal, because equal base rates eliminate chance agreement

DPercent agreement will be higher than kappa, because kappa penalizes raters for using more than two categories

Question 3 True / False

Cohen's kappa can be 0 even when two raters show high percent agreement, if that agreement is entirely explained by the expected base rate.

TTrue

FFalse

Question 4 True / False

A kappa of .80 is widely accepted as indicating good inter-rater reliability and can be applied as a universal threshold across most measurement contexts.

TTrue

FFalse

Question 5 Short Answer

Why does the prevalence of the categories being rated affect the interpretation of Cohen's kappa, and what problem does this create for researchers using binary diagnostic categories with rare conditions?

Think about your answer, then reveal below.

Questions: Inter-Rater Reliability and Observer Agreement