Stylometry and Quantitative Textual Analysis

Research Depth 76 in the knowledge graph I know this Set as goal
digital-humanities quantitative-methods authorship stylometry

Core Idea

Stylometry uses computational analysis to identify authorial 'style' by measuring linguistic features (word frequencies, sentence length, punctuation patterns) across texts. Stylometric methods can solve attribution problems, identify ghostwriting, and reveal patterns invisible to close reading. In comparative literature, stylometry enables large-scale analysis of stylistic variation across languages, periods, and traditions. However, stylometry raises philosophical questions: Can style be meaningfully quantified? What is hidden when literature becomes numerical data?

How It's Best Learned

Run stylometric analysis on a corpus of texts and interpret the results. Compare algorithmic findings with interpretive readings. Consider what stylometric evidence supports and what it obscures.

Common Misconceptions

That stylometry reveals objective truth about texts. Stylometric measures are interpretive choices (which features to measure?), and their meaning depends on theoretical framing. Quantification doesn't ensure objectivity.

Explainer

You know from Moretti's distant reading that literary scholarship can operate on large corpora rather than individual texts, using aggregation to reveal patterns invisible to close reading. Stylometry is one of the most developed quantitative methods within this tradition, and it applies a specific wager: that authors leave measurable traces in the surface features of their prose — word frequencies, function word distributions, sentence length patterns, punctuation habits — and that these traces are stable enough to identify authorship even when content varies. The analogy is forensic: just as handwriting has distinctive features even when the message changes, writing style carries authorial fingerprints.

The best-known application is authorship attribution: determining who wrote a disputed or anonymous text. The Federalist Papers case is canonical — statistical analysis of function word frequencies (words like "the," "of," "by") supported the attribution of disputed papers to Madison rather than Hamilton, because function words are largely unconscious and therefore harder to fake than content words. The technique has been applied to Shakespeare's collaborators, Elena Ferrante's identity, and the detection of ghostwritten books. The insight is that style is not just what you consciously choose to say — it is also the unconscious rhythms of how you say it.

Stylometry raises immediate philosophical questions that any serious practitioner must engage. What features to measure? Choosing word frequency over sentence rhythm, or including punctuation versus ignoring it, are not neutral decisions — they encode assumptions about what constitutes "style." Different feature sets can yield different authorship conclusions for the same texts. Stylometric analysis is therefore not the mechanical production of truth; it is a series of interpretive choices about what counts as evidence, followed by computation, followed by more interpretation of what the numbers mean. Quantification does not remove the interpreter — it embeds the interpreter's assumptions in the algorithm.

The deeper question is: what is "style" when made computational? Close reading assumes that style is meaningful — Hemingway's short sentences carry thematic weight, Faulkner's long ones enact consciousness. Stylometry treats style as a byproduct of cognitive habit, largely unconscious and content-independent. These are genuinely different theories of what literary style is and does. The most sophisticated work in the field holds both: using computational methods to identify large-scale patterns and then returning to close reading to interpret what those patterns mean. The numbers answer "who?" and point toward "what pattern?"; interpretation answers "so what?" Distance and close reading are not rivals — they are sequential tools, each doing what the other cannot.

What did you take from this?

Topics in reflective domains aren't scored by quiz answers. Read, reflect, and mark when you've thought it through.

Quiz me anyway →

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesLiteral EquationsSlope-Intercept FormPoint-Slope FormWriting Linear EquationsParallel and Perpendicular Line SlopesGraphing Linear EquationsPiecewise FunctionsStep FunctionsComposition of FunctionsLambda CalculusLambda Calculus for Linguistic SemanticsMontague SemanticsFormal Pragmatics and ContextRelevance Theory and Pragmatic InferenceDiscourse Representation TheoryContext-Update SemanticsPresupposition and the Projection ProblemPresupposition and AssertionInterpretation, Ambiguity, and Validity in Literary AnalysisMultiple Interpretations and AmbiguityIdentifying and Analyzing ThemesTracing Thematic Development Across a TextThe Novel as Extended NarrativeSubplots and Subtext in FictionDialogue in FictionNarrative Voice and Authorial StyleNarratology and Narrative TheoryMethods of Comparative Literary AnalysisMoretti: Distant Reading and Literary PatternsStylometry and Quantitative Textual Analysis

Longest path: 77 steps · 508 total prerequisite topics

Prerequisites (2)

Leads To (0)

No topics depend on this one yet.