Questions — Computational Text Analysis for Social Data

Question 1 Multiple Choice

A researcher completes a study using LDA topic modeling on 10 years of congressional speeches and reports: 'The algorithm identified 8 distinct political themes organizing the corpus.' What is the most critical missing element in this claim?

AThe software package and computational resources used to run the model

BThe number of documents and average document length in the corpus

CThe researcher's substantive interpretation of what the statistical word clusters actually mean — the algorithm produces patterns, not meaning

DValidation metrics showing the statistical fit of the model to the data

Question 2 Multiple Choice

A researcher uses a validated dictionary of economic anxiety terms to measure that concept across 50,000 news articles. What is the most fundamental assumption this method requires?

AThat the articles represent a representative sample of media coverage during the study period

BThat economic anxiety appears in text in ways that prior theory can specify — that the dictionary words reliably indicate the concept across diverse linguistic contexts in the corpus

CThat the dictionary was developed on a corpus similar to the one being analyzed

DThat the researcher has manually read at least a sample of the articles to validate the results

Question 3 True / False

In supervised text classification, biases that researchers introduce during the hand-labeling stage can propagate systematically into the trained model's classifications across the full corpus.

TTrue

FFalse

Question 4 True / False

Bag-of-words models are called 'bag-of-words' because they capture words along with their grammatical and sequential context within sentences.

TTrue

FFalse

Question 5 Short Answer

Why does having a larger corpus not automatically solve validity problems in computational text analysis?

Think about your answer, then reveal below.

Questions: Computational Text Analysis for Social Data