Hidden Markov Models

Research Depth 71 in the knowledge graph I know this Set as goal
Unlocks 8 downstream topics
markov-models sequence-models probabilistic-reasoning

Core Idea

HMMs model systems with hidden states emitting observable outputs, where state transitions follow Markov assumption. The forward algorithm computes likelihood, Viterbi decodes hidden states, and Baum-Welch learns parameters. Applications include speech recognition and sequence labeling.

How It's Best Learned

Implement forward and Viterbi algorithms for weather prediction with hidden/observable variables.

Common Misconceptions

Viterbi finds the most likely state sequence, not the most likely individual states. Baum-Welch convergence depends on initialization.

Explainer

From your study of Markov chains, you know that a system's future state depends only on its current state, not on how it got there. A Hidden Markov Model (HMM) adds a crucial twist: you cannot directly observe the states. Instead, each hidden state produces an observable output (called an emission) according to a probability distribution. You see the sequence of emissions but must infer the hidden states that generated them. The model is defined by three components: the transition probabilities (how likely is state j given that we are in state i), the emission probabilities (how likely is observation o given hidden state i), and the initial state distribution (which state does the system start in).

The classic teaching example makes this concrete. Suppose a friend lives in another city and tells you each day whether they went for a walk, shopped, or cleaned the house. You want to infer the weather in their city (sunny or rainy) from their activities. The weather is the hidden state — you never observe it directly. The activity is the emission — observable but only probabilistically related to the weather. On sunny days, your friend probably walks; on rainy days, they probably clean. The transition probabilities capture weather patterns (sunny days tend to follow sunny days), and the emission probabilities capture behavior given weather. Given a sequence of activities over a week, you want to figure out the most likely weather sequence.

This gives rise to three fundamental problems that HMMs solve. The evaluation problem asks: given a model and a sequence of observations, what is the probability of that sequence? The forward algorithm solves this efficiently using dynamic programming — instead of summing over all possible hidden state sequences (exponentially many), it builds up the answer left to right, at each time step computing the probability of being in each state having generated the observations so far. The decoding problem asks: what is the most likely sequence of hidden states? The Viterbi algorithm is structurally similar to the forward algorithm but replaces summation with maximization, tracking the best path into each state at each time step and backtracking at the end to recover the full optimal sequence.

The third problem — learning — is solved by the Baum-Welch algorithm, an instance of Expectation-Maximization (EM). Given only observed sequences (no labeled hidden states), Baum-Welch iteratively re-estimates the transition and emission probabilities to maximize the data likelihood. It alternates between computing expected state occupancies and transitions given the current parameters (the E-step, using the forward-backward algorithm) and updating the parameters to match those expectations (the M-step). Like all EM algorithms, it converges to a local maximum, making initialization important. HMMs have been foundational in speech recognition (hidden states = phonemes, observations = acoustic features), computational biology (hidden states = gene regions, observations = DNA bases), and any domain where you observe noisy outputs from a structured but invisible process.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesAngle Pairs: Complementary, Supplementary, and VerticalParallel Lines and TransversalsCorresponding AnglesAlternate Interior AnglesTriangle Angle Sum TheoremExterior Angle TheoremTriangle Inequality TheoremSimilar Triangles: AA SimilaritySimilar Triangles: SSS and SAS SimilarityProportions in Similar TrianglesRight Triangle Trigonometry IntroductionTrigonometric Ratios ReviewRadian MeasureConverting Between Degrees and RadiansThe Unit CircleGraphing Sine and CosineGraphing Tangent and Reciprocal Trigonometric FunctionsDerivatives of Trigonometric FunctionsAntiderivativesIterated Integrals and Fubini's TheoremJoint Distributions and Marginals (Rigorous)Independence of Sigma-AlgebrasConditional ExpectationMarkov ChainsHidden Markov Models

Longest path: 72 steps · 335 total prerequisite topics

Prerequisites (4)

Leads To (4)