Questions: Named Entity Recognition (NER)

5 questions to test your understanding

Score: 0 / 5
Question 1 Multiple Choice

A NER system classifies each token independently, selecting the highest-probability label at each position without considering neighboring labels. What critical problem does this create that a CRF layer would prevent?

AIt cannot process sentences longer than the model's maximum sequence length
BIt may produce structurally invalid label sequences, such as I-PER appearing without a preceding B-PER
CIt assigns lower confidence scores, making the predictions unreliable for downstream use
DIt cannot distinguish between entity types that appear in similar grammatical positions
Question 2 Multiple Choice

In 'Washington issued a statement,' a NER system correctly tags 'Washington' as an organization, while in 'Washington crossed the Delaware,' it tags 'Washington' as a person. Which architectural feature of BERT explains this disambiguation?

AByte-pair encoding, which creates distinct subword tokens for words used in different semantic roles
BContextual embeddings that produce different vector representations for the same token depending on surrounding context
CThe CRF transition layer, which knows that person names tend to precede action verbs like 'crossed'
DAttention heads that explicitly attend to the word 'Delaware' and infer that Washington must be a person
Question 3 True / False

The BIO tagging scheme (Beginning, Inside, Outside) is necessary for NER because without it, a model cannot determine where one multi-word entity ends and another begins.

TTrue
FFalse
Question 4 True / False

A BiLSTM-CRF NER model assigns each token a label based primarily on that token and its immediate neighbors, making it fundamentally similar to an n-gram classifier.

TTrue
FFalse
Question 5 Short Answer

Why does adding a CRF layer on top of a BiLSTM improve NER performance, rather than simply taking the highest-probability label at each token position?

Think about your answer, then reveal below.