With N=10 states and T=100 time steps, brute-force enumeration of all state sequences requires examining 10^100 candidates. Why does Viterbi reduce this to O(T × N²)?
AViterbi prunes low-probability transitions early, skipping most candidates without evaluating them
BMost state transitions have zero probability in real HMMs, so the effective search space is much smaller
CThe most likely path ending in any state at time t depends only on the best path reaching each predecessor at t−1; keeping one optimal partial path per state eliminates the need to track all paths
DHMMs have at most N² possible transitions, which directly caps the number of paths to evaluate
Viterbi exploits optimal substructure: the best complete path to state j at time t must pass through the best path to some predecessor state i at t−1. You do not need to remember every possible partial path to each state — only the single best one. At each of T time steps, you update N state scores, each requiring a max over N predecessors: T × N × N = O(T × N²). The exponential blowup is eliminated because suboptimal partial paths are discarded immediately.
Question 2 Multiple Choice
After running Viterbi on an observation sequence, what is the purpose of the backpointer table stored during the algorithm?
ATo record the emission probabilities at each time step, so results can be verified against the original HMM
BTo store probabilities for every possible state sequence, enabling comparison with alternative paths
CTo record, at each time step and state, which predecessor state achieved the maximum probability — enabling reconstruction of the optimal path by tracing backward from the final state
DTo allow the algorithm to restart if the most likely final state has probability zero
The Viterbi table δₜ(j) stores the probability of the best path ending in state j at time t, but does not itself tell you what that path was. The backpointer ψₜ(j) records which predecessor state i produced the maximum when computing δₜ(j). After the algorithm terminates, you find the most likely final state, then follow backpointers backward through time to reconstruct the complete optimal sequence. Without backpointers, you would know the probability of the best path but not which path it was.
Question 3 True / False
The Viterbi algorithm finds the single most likely state sequence given the observations — it does not compute the probability of a particular observation sequence.
TTrue
FFalse
Answer: True
Viterbi solves the 'decoding problem': finding the most likely hidden state sequence, argmax P(states | observations). Computing P(observations) — the total probability of the observation sequence summed over all state sequences — is a different problem solved by the Forward algorithm. Viterbi maximizes over paths rather than summing them, so it finds the best single path but not the marginal probability of the observations.
Question 4 True / False
Viterbi is very likely to find the globally optimal state sequence because it carefully evaluates most N^T possible paths before selecting the best one.
TTrue
FFalse
Answer: False
Viterbi finds the global optimum without evaluating all N^T paths. It achieves this by exploiting optimal substructure: once a suboptimal partial path is identified at any time step, it can be discarded permanently, because no extension of a suboptimal partial path can produce the globally optimal complete path. The algorithm guarantees optimality through dynamic programming logic, not exhaustive search — which is precisely what makes it tractable.
Question 5 Short Answer
Explain why the Viterbi algorithm can be viewed as finding the highest-probability path through a trellis graph, and what the nodes and edges of that graph represent.
Think about your answer, then reveal below.
Model answer: The trellis is a directed acyclic graph with T columns (one per time step) and N rows (one per state), giving N×T nodes total. Each node represents being in state j at time t. Edges connect each node at time t to all nodes at time t+1, weighted by the transition probability times the emission probability for the next observation. Finding the most likely state sequence is equivalent to finding the highest-weight path from any node in column 1 to any node in column T — a shortest-path problem on this graph.
Viewing Viterbi as a shortest-path problem makes the dynamic programming structure concrete and connects it to graph algorithms. The trellis perspective also clarifies why backpointers are needed: just as Dijkstra's algorithm stores predecessor nodes to reconstruct the shortest path, Viterbi stores predecessor states to reconstruct the most likely sequence. The O(T × N²) complexity corresponds to the number of edges in the trellis.