Phylogenetic Tree Construction

Graduate Depth 191 in the knowledge graph I know this Set as goal
Unlocks 29 downstream topics
phylogenetics neighbor-joining maximum-likelihood Bayesian bootstrap tree-topology

Core Idea

Phylogenetic trees depict evolutionary relationships among sequences or species, inferred from aligned molecular data. Distance-based methods (neighbor-joining) cluster sequences by pairwise distances. Character-based methods (maximum parsimony, maximum likelihood, Bayesian inference) evaluate alternative tree topologies against the alignment data. Maximum likelihood finds the tree that makes the observed data most probable given a model of sequence evolution. Bootstrap values and Bayesian posterior probabilities assess statistical support for each branch. Tree construction requires choosing an appropriate substitution model and rooting strategy.

How It's Best Learned

Build a neighbor-joining tree and a maximum likelihood tree from the same MSA of 10-15 orthologous sequences. Compare the topologies and bootstrap support values. Experiment with different substitution models (JC69 vs. GTR) and observe how model choice affects branch lengths and topology.

Common Misconceptions

Explainer

Phylogenetic trees are the primary tool for representing evolutionary relationships, and molecular sequence data has become the dominant source of information for building them. Given a multiple sequence alignment, the question is: what tree topology (branching pattern) and branch lengths best explain the observed pattern of similarities and differences? Different methods answer this question in fundamentally different ways.

Distance-based methods convert the MSA into a matrix of pairwise evolutionary distances (corrected for multiple substitutions at the same site), then build a tree that approximates those distances. Neighbor-joining (NJ) is the most widely used distance method: it iteratively joins the pair of sequences that minimizes the total branch length of the tree, adjusting for the average distance to all other sequences. NJ is fast (O(n^3) for n sequences) and produces reasonable trees, making it useful for quick exploratory analyses and very large datasets. But it reduces the full alignment to pairwise distances, losing information about which specific sites support which groupings.

Maximum likelihood (ML) takes a fundamentally different approach. It considers the alignment column by column, calculates the probability of each observed column pattern for every possible tree topology under a specified model of sequence evolution, and multiplies these probabilities across all columns to get the likelihood of the entire dataset given each tree. The tree with the highest total likelihood is selected. This approach uses all the information in the alignment and explicitly models the evolutionary process, but it requires searching an enormous space of possible topologies (which grows super-exponentially with the number of sequences). Software like RAxML and IQ-TREE use heuristic search strategies to navigate this space efficiently.

Bayesian inference (implemented in MrBayes and BEAST) extends ML by incorporating prior probabilities on tree topologies, branch lengths, and model parameters, using Markov chain Monte Carlo (MCMC) sampling to explore the posterior distribution. Rather than returning a single best tree, Bayesian methods return a distribution of trees weighted by their posterior probability, naturally providing measures of uncertainty. Bayesian posterior probabilities on branches tend to be higher than bootstrap values for the same data, and interpreting them correctly requires understanding MCMC convergence diagnostics.

Regardless of method, the resulting tree must be evaluated critically. Bootstrap analysis for ML/NJ and posterior probabilities for Bayesian trees indicate how strongly the data support each branch. An unrooted tree shows relative relationships but not the direction of evolution; rooting (typically with an outgroup) is needed to infer ancestor-descendant relationships. And the tree reflects the history of the sequences analyzed, which may not match the species tree if gene duplication, horizontal transfer, or incomplete lineage sorting has occurred.

Practice Questions 3 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesAngle Pairs: Complementary, Supplementary, and VerticalParallel Lines and TransversalsCorresponding AnglesAlternate Interior AnglesTriangle Angle Sum TheoremExterior Angle TheoremTriangle Inequality TheoremSimilar Triangles: AA SimilaritySimilar Triangles: SSS and SAS SimilarityProportions in Similar TrianglesRight Triangle Trigonometry IntroductionTrigonometric Ratios ReviewRadian MeasureConverting Between Degrees and RadiansThe Unit CircleGraphing Sine and CosineGraphing Tangent and Reciprocal Trigonometric FunctionsDerivatives of Trigonometric FunctionsAntiderivativesIterated Integrals and Fubini's TheoremDouble Integrals in Cartesian CoordinatesDouble Integrals over Rectangular RegionsDouble Integrals in Polar CoordinatesDouble Integrals: Definition and SetupIterated Integrals and Fubini's TheoremDouble Integrals over Rectangular RegionsDouble Integrals over General RegionsApplications of Double Integrals: Area, Mass, and MomentsTriple Integrals in Cartesian CoordinatesTriple Integrals in Cylindrical and Spherical CoordinatesChange of Variables and the Jacobian DeterminantApplications of Triple Integrals: Volume and MassVector Fields and Their RepresentationsLine Integrals of Vector FieldsGreen's TheoremSurface Integrals and Flux of Vector FieldsSurface Integrals and Flux of Vector FieldsDivergence Theorem: Flux and OutflowDivergence TheoremElectric FluxGauss's LawConductors in Electrostatic EquilibriumCapacitance and CapacitorsDielectricsDielectric Constant and Relative PermittivityElectric Field Inside Dielectric MaterialsDielectric Materials and PolarizationDielectric Susceptibility and PermittivityEnergy Density in Electric FieldsElectric Current and Current DensityElectrical Resistance and ResistivityOhm's Law and Circuit ElementsElectromotive Force (EMF) and BatteriesKirchhoff's Circuit Laws: Voltage and CurrentDC Circuit Network Analysis MethodsTransient Response in RC CircuitsRC CircuitsLC and RLC CircuitsAC Circuits: FundamentalsImpedance and ReactanceAC Power and ResonanceElectromagnetic WavesThe Electromagnetic SpectrumBlackbody Radiation and Planck's LawPhotoelectric EffectThe Photon: Light as QuantaCompton ScatteringWave-Particle Dualityde Broglie WavelengthHeisenberg Uncertainty PrincipleWavefunction and the Born RuleThe Schrödinger EquationState Vectors and WavefunctionsQuantum SuperpositionQuantum EntanglementBell Theorem and Bell InequalitiesPostulates of Quantum MechanicsScattering TheoryIntroduction to Scattering TheoryPartial Wave Analysis in ScatteringSpin Angular MomentumElectron Spin and Intrinsic Magnetic MomentStern-Gerlach Experiment: Spin Quantization and MeasurementElectron Diffraction and Matter Wave PropertiesDavisson-Germer Experiment: Crystal Diffraction of ElectronsElectron Diffraction and Matter Wave InterferenceWavefunctions and Probability Density InterpretationQuantum Superposition and Linear Combinations of StatesQuantum Operators and ObservablesCanonical Commutation Relations and UncertaintyHeisenberg Uncertainty Principle and Measurement LimitsTime-Independent Schrödinger Equation and EigenvaluesHydrogen Atom in Quantum MechanicsSpectral Lines and Energy TransitionsSelection Rules for Atomic TransitionsLS and jj Coupling Schemes in Multi-Electron AtomsPauli Exclusion Principle and Antisymmetric WavefunctionsElectron Configuration and the Aufbau PrincipleThe Periodic Table and Atomic Electronic StructureThe Periodic TableElectron ConfigurationPeriodic TrendsIonization EnergyIonic BondingLewis StructuresResonance Structures and Delocalized ElectronsResonance and Formal ChargeMolecular Polarity and Dipole MomentsIntermolecular ForcesStates of Matter and Phase Changes: Melting, Boiling, and SublimationGas Laws and the Ideal Gas EquationGas Stoichiometry and Volume-Volume CalculationsThermochemistry and EnthalpyHeat Capacity and CalorimetryEntropy and Molecular DisorderSpontaneity and ΔGEntropy and Gibbs Free EnergyChemical EquilibriumChemical KineticsRate Law DeterminationEnzyme KineticsCell Cycle Regulation and CheckpointsMitosisCytokinesisMeiosisChromosomal Theory of InheritanceMendelian GeneticsDominance, Recessiveness, and Allelic InteractionsSex-Linked InheritanceNon-Mendelian Inheritance PatternsPopulation Genetics and Hardy-Weinberg EquilibriumNatural SelectionGenetic DriftEvolutionary Genetics FoundationsAllele Frequency Change and Evolutionary DynamicsGene Flow and Population StructureGene Flow and Selection: Opposing ForcesGene FlowHardy-Weinberg EquilibriumSpeciationPhylogenetics and Evolutionary TreesMolecular Evolution and Molecular ClocksPairwise Sequence AlignmentMultiple Sequence AlignmentPhylogenetic Tree Construction

Longest path: 192 steps · 978 total prerequisite topics

Prerequisites (4)

Leads To (1)