What does a bootstrap value of 85 on an internal branch of a phylogenetic tree indicate?
A85% of the sequences support that branch
BIn 85% of bootstrap replicates (resampled alignment columns), that branch was recovered
CThe branch has an 85% probability of being correct
D85% of the nucleotide sites are informative for that branch
Bootstrapping resamples columns of the multiple sequence alignment with replacement to create many pseudo-replicate datasets (typically 100-1,000). A tree is built from each replicate, and the bootstrap value reports the percentage of replicates in which a given branch appeared. A value of 85 means 85% of resampled datasets recovered that grouping. It measures the sensitivity of the grouping to data perturbation, not the probability of correctness in an absolute sense.
Question 2 True / False
Neighbor-joining and maximum likelihood always produce identical tree topologies for the same dataset.
TTrue
FFalse
Answer: False
Neighbor-joining is a fast distance-based method that uses a greedy algorithm to build a single tree from pairwise distances. Maximum likelihood evaluates many possible topologies and selects the one with the highest probability of producing the observed data under a specified substitution model. They can produce different topologies because NJ reduces the data to pairwise distances (losing information), uses a greedy strategy (which may not find the global optimum), and does not explicitly model the substitution process. ML is generally considered more accurate but is much more computationally expensive.
Question 3 Short Answer
Why is the choice of substitution model important when constructing a maximum likelihood phylogenetic tree?
Think about your answer, then reveal below.
Model answer: The substitution model specifies the rates and probabilities of different nucleotide or amino acid changes. A model that is too simple (like JC69, which assumes all substitutions are equally likely) will underestimate the true amount of sequence divergence, especially between distantly related sequences, because it fails to account for multiple substitutions at the same site. A more parameter-rich model (like GTR, which allows different rates for each substitution type and unequal base frequencies) better captures real evolutionary dynamics but requires more data to estimate its parameters reliably. The wrong model can systematically distort branch lengths and even change the tree topology.
Model selection is typically done using information criteria (AIC, BIC) or likelihood ratio tests. Tools like ModelTest-NG and jModelTest automate this by fitting many models to the data and selecting the best-fitting one. The selected model is then used for the full ML tree search.