Questions: Sequence Labeling and CRFs

5 questions to test your understanding

Score: 0 / 5
Question 1 Multiple Choice

An HMM and a CRF are trained on the same part-of-speech tagging dataset. The CRF achieves significantly higher accuracy. What is the most likely reason?

AThe CRF uses a more powerful inference algorithm than Viterbi, finding globally optimal label sequences that HMMs cannot
BThe CRF can incorporate arbitrary features of the input — capitalization, word suffixes, neighboring words — without the HMM's independence constraint that observation probability depends only on the current tag
CThe CRF captures longer-range label-to-label dependencies, whereas HMMs only model adjacent tag pairs
DThe CRF uses a larger tag vocabulary, distinguishing more fine-grained parts of speech
Question 2 Multiple Choice

A student argues that CRFs outperform HMMs because CRFs model dependencies between consecutive labels, whereas HMMs treat each label independently. What is wrong with this claim?

AThe student is correct — HMMs assume all labels are independent and cannot model transitions
BHMMs also model label-to-label dependencies through transition probabilities; the actual CRF advantage is discriminative modeling that allows arbitrary input features without requiring a generative model of observations
CCRFs cannot model label dependencies — they score each label position independently and then pick the best combination
DBoth models are equivalent in practice; performance differences come only from training data size
Question 3 True / False

Both Hidden Markov Models and linear-chain CRFs use the Viterbi algorithm to find the most probable label sequence at inference time.

TTrue
FFalse
Question 4 True / False

A CRF's main advantage over an HMM is that it captures label-to-label dependencies that HMMs fundamentally cannot model.

TTrue
FFalse
Question 5 Short Answer

Explain why a CRF can incorporate features like 'the word ends in -ing' or 'the previous word is a title' more effectively than an HMM, even though both models capture dependencies between adjacent labels.

Think about your answer, then reveal below.