Questions: Computational Text Analysis for Social Data

5 questions to test your understanding

Score: 0 / 5
Question 1 Multiple Choice

A researcher completes a study using LDA topic modeling on 10 years of congressional speeches and reports: 'The algorithm identified 8 distinct political themes organizing the corpus.' What is the most critical missing element in this claim?

AThe software package and computational resources used to run the model
BThe number of documents and average document length in the corpus
CThe researcher's substantive interpretation of what the statistical word clusters actually mean — the algorithm produces patterns, not meaning
DValidation metrics showing the statistical fit of the model to the data
Question 2 Multiple Choice

A researcher uses a validated dictionary of economic anxiety terms to measure that concept across 50,000 news articles. What is the most fundamental assumption this method requires?

AThat the articles represent a representative sample of media coverage during the study period
BThat economic anxiety appears in text in ways that prior theory can specify — that the dictionary words reliably indicate the concept across diverse linguistic contexts in the corpus
CThat the dictionary was developed on a corpus similar to the one being analyzed
DThat the researcher has manually read at least a sample of the articles to validate the results
Question 3 True / False

In supervised text classification, biases that researchers introduce during the hand-labeling stage can propagate systematically into the trained model's classifications across the full corpus.

TTrue
FFalse
Question 4 True / False

Bag-of-words models are called 'bag-of-words' because they capture words along with their grammatical and sequential context within sentences.

TTrue
FFalse
Question 5 Short Answer

Why does having a larger corpus not automatically solve validity problems in computational text analysis?

Think about your answer, then reveal below.