5 questions to test your understanding
An HMM and a CRF are trained on the same part-of-speech tagging dataset. The CRF achieves significantly higher accuracy. What is the most likely reason?
A student argues that CRFs outperform HMMs because CRFs model dependencies between consecutive labels, whereas HMMs treat each label independently. What is wrong with this claim?
Both Hidden Markov Models and linear-chain CRFs use the Viterbi algorithm to find the most probable label sequence at inference time.
A CRF's main advantage over an HMM is that it captures label-to-label dependencies that HMMs fundamentally cannot model.
Explain why a CRF can incorporate features like 'the word ends in -ing' or 'the previous word is a title' more effectively than an HMM, even though both models capture dependencies between adjacent labels.