Two coders applying a coding scheme to 200 news articles reach 91% raw agreement. One category ('mentions crime') applies to 89% of articles. What is the most important interpretive concern?
AThe sample is too small to draw conclusions about intercoder reliability
B91% agreement is below the standard threshold of 95%, so the scheme should be abandoned
CThe high base rate of the category means coders could achieve ~80% agreement by chance alone, making the kappa coefficient likely much lower than the raw agreement suggests
DRaw agreement is the gold standard for reliability, so 91% indicates excellent agreement
When a category applies to the vast majority of cases, coders will agree most of the time simply by both defaulting to the dominant response — no real coding discrimination is occurring. Cohen's kappa corrects for this by subtracting the expected chance agreement from the observed agreement. In this case, if both coders code 'yes' about 89% of the time independently, chance agreement alone would be around 80% (0.89 × 0.89 + 0.11 × 0.11 ≈ 0.80). A 91% raw agreement against an 80% chance baseline produces a kappa of only about 0.55 — moderate, not excellent. This is precisely why raw agreement without kappa is misleading.
Question 2 Multiple Choice
A researcher wants to determine whether newspaper coverage of immigration emphasizes economic contributions or security threats. Which type of content analysis does this require, and why?
AManifest content analysis, because the words used in articles can be counted objectively
BLatent content analysis, because determining the 'frame' or emphasis requires interpretive judgment about implied meaning
CQuantitative content analysis only, because framing requires counting the frequency of relevant themes
DNeither — framing analysis is a distinct method incompatible with content analysis
Framing — how a topic is contextualized and what aspects are emphasized — is an interpretive, meaning-level judgment, not a simple count of surface-level words. A story can mention immigration without using the words 'crime' or 'economy' but still frame it as threatening through tone, source selection, and implicit associations. Capturing this requires latent coding: coders must make interpretive judgments about the underlying meaning of the text, guided by clear operational definitions. Manifest coding (option A) would miss the framing cues that don't appear as explicit keywords. The two approaches are complementary, not mutually exclusive, and many framing studies combine both.
Question 3 True / False
Manifest and latent content coding can be combined in the same research design to capture different dimensions of the same texts.
TTrue
FFalse
Answer: True
Manifest and latent coding are not mutually exclusive — they address different epistemological questions and capture different levels of meaning. A researcher studying media bias might use manifest coding to count factual claims (how many times each politician is quoted) and latent coding to assess tone or framing. Combining both provides a richer, more complete picture than either alone. The misconception that they are mutually exclusive likely stems from treating them as competing methodologies rather than complementary tools.
Question 4 True / False
High intercoder agreement on a coding scheme is sufficient evidence that the scheme is measuring what the researcher intends it to measure.
TTrue
FFalse
Answer: False
This conflates reliability with validity — the most common error in coding scheme evaluation. Two coders can agree perfectly on a category that doesn't actually capture the construct of interest. For example, if researchers want to measure 'aggressive tone' but operationally define it as 'uses exclamation marks,' coders will agree reliably on exclamation marks while failing to capture actual aggression. Reliability (consistency of coding) is necessary but not sufficient for validity (measuring the right thing). A coding scheme must be validated by checking whether coded categories correspond to the underlying theoretical construct, not just whether coders agree.
Question 5 Short Answer
Why is Cohen's kappa preferred over raw agreement percentage when assessing intercoder reliability, and when does the distinction matter most?
Think about your answer, then reveal below.
Model answer: Cohen's kappa corrects for the agreement that would be expected by chance alone if coders were simply guessing according to the marginal distribution of categories. Raw agreement counts all matching codes equally, regardless of whether the match was informative or trivially expected. The distinction matters most when category distributions are highly skewed — when one category dominates (e.g., 95% of texts are coded 'absent'). In such cases, both coders could achieve high raw agreement by consistently coding 'absent' without exercising any real judgment. Kappa subtracts this baseline, revealing how much reliable discrimination is actually occurring beyond chance. A study reporting 90% raw agreement may look rigorous while actually having near-zero kappa if the dominant category drives most of the agreement.
The deeper point is that intercoder reliability testing exists to demonstrate that the coding categories are operative — that they actually discriminate between cases in a principled way. Kappa forces researchers to confront whether their scheme works. A scheme with high raw agreement but low kappa is typically a sign that the coding categories are poorly constructed, the raters are not independently using the definitions, or the category is too rare or too dominant to test.