Convergence of Markov Chains

Research Depth 87 in the knowledge graph I know this Set as goal
Unlocks 47 downstream topics
markov-chains convergence mixing

Core Idea

An irreducible, aperiodic Markov chain converges in distribution to its stationary distribution π: P(X_n = j) → π(j). The convergence rate depends on the spectral gap (largest minus second-largest eigenvalue of P); larger gaps mean faster mixing. Convergence ensures MCMC samples approach the target distribution.

Explainer

From stationary distributions, you know that a distribution π is stationary if πP = π — the chain, once started in π, stays in π forever. But that raises a practical question: if the chain starts somewhere far from π, does it eventually converge to π regardless of where it starts? And if so, how fast? These questions are the subject of Markov chain convergence theory.

Two structural conditions on the transition matrix P jointly guarantee convergence. Irreducibility means every state can reach every other state in some finite number of steps — the chain cannot be trapped in a subset of states. Aperiodicity means no state forces the chain to return only at regular intervals (e.g., always at even steps). A chain that oscillates between two groups of states every other step is periodic and fails to converge; it oscillates around the stationary distribution rather than settling into it. When a finite Markov chain is irreducible and aperiodic, the Perron-Frobenius theorem guarantees that the transition matrix P has a unique stationary distribution π and that P^n converges to the matrix with π repeated in every row — meaning every starting state produces the same long-run distribution.

The rate of convergence is governed by the spectral gap: the difference between the largest eigenvalue of P (which is always 1 for a stochastic matrix) and the second-largest eigenvalue in absolute value. A spectral gap close to 1 means the chain mixes rapidly — within a few steps the distribution is close to π. A gap close to 0 means slow mixing — the chain might take exponentially many steps to forget its starting state. You can visualize this with a lazy random walk: if the chain almost always stays in place and rarely moves, the spectral gap is small and convergence is glacially slow. A well-connected chain with many transitions per step has a larger spectral gap and mixes faster.

This theory underpins Markov Chain Monte Carlo (MCMC): methods like the Metropolis-Hastings algorithm and Gibbs sampling construct a Markov chain whose stationary distribution equals a target distribution (often a Bayesian posterior) that is otherwise hard to sample from directly. Convergence theory tells you that after a burn-in period — long enough for the starting point to be forgotten — subsequent samples are approximately drawn from the target. In practice, diagnosing whether a chain has converged is a major challenge: you cannot observe convergence directly, only measure symptoms like the effective sample size (related to the spectral gap) and trace plot mixing. Understanding the theoretical guarantee — irreducibility, aperiodicity, and spectral gap — is what lets you reason carefully about when MCMC output can and cannot be trusted.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesAngle Pairs: Complementary, Supplementary, and VerticalParallel Lines and TransversalsCorresponding AnglesAlternate Interior AnglesTriangle Angle Sum TheoremExterior Angle TheoremTriangle Inequality TheoremSimilar Triangles: AA SimilaritySimilar Triangles: SSS and SAS SimilarityProportions in Similar TrianglesRight Triangle Trigonometry IntroductionTrigonometric Ratios ReviewRadian MeasureConverting Between Degrees and RadiansThe Unit CircleGraphing Sine and CosineGraphing Tangent and Reciprocal Trigonometric FunctionsDerivatives of Trigonometric FunctionsAntiderivativesIndefinite IntegralsBasic Integration RulesRiemann SumsDefinite Integral DefinitionFundamental Theorem of Calculus Part 1Fundamental Theorem of Calculus Part 2U-SubstitutionPartial Fraction Decomposition for IntegrationImproper Integrals - ConvergenceIntegral TestP-SeriesComparison TestLimit Comparison TestAbsolute vs. Conditional ConvergencePower SeriesTaylor PolynomialsTaylor SeriesMoment Generating FunctionsCharacteristic FunctionsConvergence in DistributionStationary DistributionsConvergence of Markov Chains

Longest path: 88 steps · 465 total prerequisite topics

Prerequisites (2)

Leads To (2)