Questions: Digital Humanities and Computational Literary Analysis
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A literary scholar uses topic modeling on 5,000 Victorian novels and finds a cluster of words associated with urban disease and crowd anxiety appearing with increased frequency between 1880–1910. What is the most accurate characterization of this finding?
AThe finding is objective and self-interpreting — the algorithm has identified a historical literary trend
BThe finding is a pattern that requires close reading and historical contextualization before it becomes an argument
CThe finding demonstrates that computational analysis can replace traditional literary criticism for large corpora
DThe finding is only valid if the scholar can explain the algorithm's statistical parameters in full
Computational analysis produces correlations and patterns — not arguments. Topic modeling can reveal that certain word clusters co-occur across a period, but explaining *why*, and what it means for literary history, requires the hermeneutic tools of close reading and historical contextualization. The most powerful digital humanities work uses computational findings as a map that guides close reading. 'Objective' is precisely the wrong word: corpus selection, parameter choices, and cluster labeling all embed scholarly judgment.
Question 2 Multiple Choice
A researcher claims that computational analysis of 'English literature' reveals universal patterns about how literary language works across all cultures and periods. What is the most significant problem with this claim?
AComputational tools cannot handle literary language because metaphor and ambiguity defeat statistical methods
BThe corpus consists of English-language texts, embedding assumptions about what counts as literature and whose writing is preserved — findings describe that corpus, not 'literature' universally
CLiterary patterns are too complex for statistical methods to detect with any reliability
DThe claim is defensible because English literature is the most extensively digitized tradition
The corpus *is* the argument's scope. A corpus of digitized English-language novels reflects which texts have been preserved, digitized, and deemed worth including — all political and institutional decisions. Findings from such a corpus describe the patterns of that tradition. Claims of universality would require a corpus spanning languages, cultures, and periods that simply doesn't exist in usable form. Corpus selection is a fundamentally interpretive act, not a neutral technical one.
Question 3 True / False
Computational literary analysis can reveal patterns across thousands of texts that no individual reader could detect, but these patterns still require humanistic interpretation to become meaningful arguments.
TTrue
FFalse
Answer: True
This is the core relationship between distant and close reading. Distant reading changes the scale of what you can observe — word frequency trends across decades, genre distributions, stylometric signatures — but the resulting patterns do not self-interpret. Explaining what a cluster of words means, why a frequency shifts, or what a stylometric signature implies for literary history requires the contextual, interpretive work that humanistic scholarship provides.
Question 4 True / False
The 'distant reading' approach assumes that close reading of individual texts is an inferior method that should be replaced once sufficient computational power is available.
TTrue
FFalse
Answer: False
Distant reading and close reading operate at different scales and answer different questions — they are complementary, not competing. Moretti described distant reading as a 'condition of knowledge' about macro-patterns; close reading produces fine-grained understanding of individual texts. The most powerful digital humanities work uses computational findings to identify where interesting territory is, then deploys close reading to explore it.
Question 5 Short Answer
Why is the selection of a corpus a fundamentally interpretive act in digital humanities, rather than a neutral technical decision?
Think about your answer, then reveal below.
Model answer: Every corpus embeds assumptions: about what counts as literature, which languages and traditions are legible to scholarship, what has been digitized (itself shaped by cultural and economic power), and which time periods and geographies are included. A corpus of English-language novels from major publishers will reveal things about that tradition and nothing about others. Because all findings are specific to the texts analyzed, the choice of corpus determines what questions can be asked and whose experiences are visible — making it a scholarly and political decision, not merely a technical one.
This is why computational analysis is not 'objective' despite using algorithms. The objectivity of the algorithm does not extend to the corpus it analyzes. Canon formation, archival access, digitization funding, and language coverage all shape what is computationally available — meaning the field's findings systematically reflect the biases of what has been preserved and digitized.