Questions: Parsimony in Phylogenetic Reconstruction
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
Two distantly related lineages have each accumulated many substitutions along long branches. By chance, 15 sites in their sequences show identical nucleotides arising from independent mutations. A parsimony analysis groups these two lineages as sisters. What problem is this, and what causes it?
ASaturation: too many substitutions destroy phylogenetic signal, causing random groupings
BLong branch attraction: parsimony misinterprets convergent homoplastic changes as evidence of shared ancestry, pulling long branches together incorrectly
CPolytomy collapse: parsimony cannot resolve rapidly evolving lineages and defaults to incorrect groupings
DMolecular clock violation: unequal rates cause the parsimony score to be miscalculated
Long branch attraction is parsimony's most important and systematic failure mode. When two lineages evolve rapidly (long branches), chance produces identical mutations in both — homoplasy. Parsimony treats these convergent similarities as evidence of shared ancestry (shared derived characters), grouping the two fast-evolving lineages together even when they are not closely related. The key word is 'systematic': this is not random noise but a biased error that parsimony can commit with high confidence, producing a confidently wrong tree. Model-based methods account for the probability of multiple hits at the same site and are more robust to this artifact.
Question 2 Multiple Choice
For which type of data is maximum parsimony most defensible as a phylogenetic method, and why?
ADNA sequences with high substitution rates, because parsimony is fastest when many characters are informative
BSequences from rapidly radiating clades, because parsimony handles polytomies better than model-based methods
CMorphological characters, because defining realistic substitution models for the evolution of anatomical structures is extremely difficult
DAncient DNA, because parsimony does not assume a molecular clock and handles missing data well
Parsimony's biggest liability — requiring no model — is also its biggest asset for morphological data. For DNA, realistic substitution models (GTR, HKY, etc.) have been well-developed and validated; using them substantially improves inference. But how do you model the rate at which a fin becomes a limb, or the number of independent times eyes have evolved? Realistic morphological models are hard to define, and the parameter estimation involved can be unreliable. Parsimony sidesteps this by simply counting minimum changes, which is more defensible when model specification is genuinely uncertain.
Question 3 True / False
Maximum parsimony can recover a confidently incorrect phylogenetic tree — not just an uncertain one — when long branches are present in the data.
TTrue
FFalse
Answer: True
This is the critical point that separates long branch attraction from ordinary statistical noise. When two long branches accumulate many homoplastic changes that parsimony misinterprets as synapomorphies, the method does not produce a polytomy or low bootstrap support — it produces a strongly supported, but wrong, grouping. Bootstrap resampling the same data over and over simply confirms the same erroneous pattern. This is why long branch attraction is so dangerous: the method can be maximally confident about an incorrect topology, giving no warning signal that something is wrong.
Question 4 True / False
Parsimony is the preferred method for large genomic datasets because it requires no assumptions about the evolutionary process, making it the most objective approach to phylogenetic inference.
TTrue
FFalse
Answer: False
While parsimony does not require specification of a substitution model, it implicitly assumes that the most parsimonious explanation is the most likely — which is itself an assumption that breaks down when substitution rates are unequal across lineages. For large genomic datasets, model-based methods (maximum likelihood, Bayesian inference) are generally preferred because they can account for rate variation, multiple substitutions at the same site, and other biological realities that parsimony ignores. Parsimony's speed advantage has also diminished as computational resources have grown. Today it is most useful as a baseline or for morphological data.
Question 5 Short Answer
Explain how maximum parsimony's core principle — choosing the tree requiring the fewest changes — can lead to a confidently incorrect phylogeny when lineages evolve at very different rates.
Think about your answer, then reveal below.
Model answer: When two lineages evolve rapidly, each accumulates many substitutions independently. At some sites, by chance, both lineages acquire the same nucleotide through convergent evolution (homoplasy). Parsimony interprets these shared but independently derived characters as evidence of common ancestry — shared derived characters (synapomorphies) — and therefore groups the two fast-evolving lineages together. Because the number of homoplastic shared characters can be large on long branches, parsimony may treat them as strong evidence for incorrect grouping. The method has no mechanism to distinguish shared ancestry from coincidental convergence when multiple substitutions occur at the same site.
Model-based methods solve this by using substitution models that estimate the probability of multiple hits at the same site. A position where G appears in two distantly related taxa is assigned a probability that accounts for multiple changes through the same site — recognizing that apparent similarity may reflect independent paths through the same nucleotide state. Parsimony's 'one change = one event' assumption fails whenever the same site changes more than once.