← Graph View All Domains

A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Hidden Markov Models

Research Depth 96 in the knowledge graph ☐ I know this ☆ Set as goal

8topics build on this

523prerequisites beneath it

See this on the map →

Markov Chains Probability Mass Functions +2 more→→Expectation-Maximization Algorithm Markov Random Fields +2 more

Core Idea

HMMs model systems with hidden states emitting observable outputs, where state transitions follow Markov assumption. The forward algorithm computes likelihood, Viterbi decodes hidden states, and Baum-Welch learns parameters. Applications include speech recognition and sequence labeling.

How It's Best Learned

Implement forward and Viterbi algorithms for weather prediction with hidden/observable variables.

Common Misconceptions

Viterbi finds the most likely state sequence, not the most likely individual states. Baum-Welch convergence depends on initialization.

Explainer

From your study of Markov chains, you know that a system's future state depends only on its current state, not on how it got there. A Hidden Markov Model (HMM) adds a crucial twist: you cannot directly observe the states. Instead, each hidden state produces an observable output (called an emission) according to a probability distribution. You see the sequence of emissions but must infer the hidden states that generated them. The model is defined by three components: the transition probabilities (how likely is state j given that we are in state i), the emission probabilities (how likely is observation o given hidden state i), and the initial state distribution (which state does the system start in).

The classic teaching example makes this concrete. Suppose a friend lives in another city and tells you each day whether they went for a walk, shopped, or cleaned the house. You want to infer the weather in their city (sunny or rainy) from their activities. The weather is the hidden state — you never observe it directly. The activity is the emission — observable but only probabilistically related to the weather. On sunny days, your friend probably walks; on rainy days, they probably clean. The transition probabilities capture weather patterns (sunny days tend to follow sunny days), and the emission probabilities capture behavior given weather. Given a sequence of activities over a week, you want to figure out the most likely weather sequence.

This gives rise to three fundamental problems that HMMs solve. The evaluation problem asks: given a model and a sequence of observations, what is the probability of that sequence? The forward algorithm solves this efficiently using dynamic programming — instead of summing over all possible hidden state sequences (exponentially many), it builds up the answer left to right, at each time step computing the probability of being in each state having generated the observations so far. The decoding problem asks: what is the most likely sequence of hidden states? The Viterbi algorithm is structurally similar to the forward algorithm but replaces summation with maximization, tracking the best path into each state at each time step and backtracking at the end to recover the full optimal sequence.

The third problem — learning — is solved by the Baum-Welch algorithm, an instance of Expectation-Maximization (EM). Given only observed sequences (no labeled hidden states), Baum-Welch iteratively re-estimates the transition and emission probabilities to maximize the data likelihood. It alternates between computing expected state occupancies and transitions given the current parameters (the E-step, using the forward-backward algorithm) and updating the parameters to match those expectations (the M-step). Like all EM algorithms, it converges to a local maximum, making initialization important. HMMs have been foundational in speech recognition (hidden states = phonemes, observations = acoustic features), computational biology (hidden states = gene regions, observations = DNA bases), and any domain where you observe noisy outputs from a structured but invisible process.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Dividing Integers → Unit Rates → Proportions → Percent Concept → Converting Between Fractions, Decimals, and Percents → Operations with Rational Numbers → Two-Step Equations → Solving Multi-Step Equations → Equations with Variables on Both Sides → Angle Pairs: Complementary, Supplementary, and Vertical → Parallel Lines and Transversals → Corresponding Angles → Alternate Interior Angles → Triangle Angle Sum Theorem → Exterior Angle Theorem → Triangle Inequality Theorem → Similar Triangles: AA Similarity → Similar Triangles: SSS and SAS Similarity → Proportions in Similar Triangles → Right Triangle Trigonometry Introduction → Sine, Cosine, and Tangent Ratios → Trigonometric Ratios Review → Radian Measure → Converting Between Degrees and Radians → The Unit Circle → Graphing Sine and Cosine → Graphing Tangent and Reciprocal Trigonometric Functions → Derivatives of Trigonometric Functions → Antiderivatives → Indefinite Integrals → Basic Integration Rules → Riemann Sums → Definite Integral Definition → Probability Density Functions and Continuous Distributions → Cumulative Distribution Functions → Continuous Random Variables → Probability Density Functions → Expected Value → Weak Law of Large Numbers → Probability Axioms and Rules → Conditional Probability → Conditional Distributions → Conditional Expectation → Markov Chains → Hidden Markov Models

Longest path: 97 steps · 523 total prerequisite topics

Prerequisites (4)

Markov Chainshard Probability Mass Functionshard Conditional Probabilitysoft Probability Axioms and Rulessoft

Leads To (4)

Expectation-Maximization Algorithmsoft Markov Random Fieldssoft Sequence Labeling and CRFshard Viterbi Algorithmhard