Questions — Named Entity Recognition (NER)

Question 1 Multiple Choice

A NER system classifies each token independently, selecting the highest-probability label at each position without considering neighboring labels. What critical problem does this create that a CRF layer would prevent?

AIt cannot process sentences longer than the model's maximum sequence length

BIt may produce structurally invalid label sequences, such as I-PER appearing without a preceding B-PER

CIt assigns lower confidence scores, making the predictions unreliable for downstream use

DIt cannot distinguish between entity types that appear in similar grammatical positions

Question 2 Multiple Choice

In 'Washington issued a statement,' a NER system correctly tags 'Washington' as an organization, while in 'Washington crossed the Delaware,' it tags 'Washington' as a person. Which architectural feature of BERT explains this disambiguation?

AByte-pair encoding, which creates distinct subword tokens for words used in different semantic roles

BContextual embeddings that produce different vector representations for the same token depending on surrounding context

CThe CRF transition layer, which knows that person names tend to precede action verbs like 'crossed'

DAttention heads that explicitly attend to the word 'Delaware' and infer that Washington must be a person

Question 3 True / False

The BIO tagging scheme (Beginning, Inside, Outside) is necessary for NER because without it, a model cannot determine where one multi-word entity ends and another begins.

TTrue

FFalse

Question 4 True / False

A BiLSTM-CRF NER model assigns each token a label based primarily on that token and its immediate neighbors, making it fundamentally similar to an n-gram classifier.

TTrue

FFalse

Question 5 Short Answer

Why does adding a CRF layer on top of a BiLSTM improve NER performance, rather than simply taking the highest-probability label at each token position?

Think about your answer, then reveal below.

Questions: Named Entity Recognition (NER)