A bag-of-words sentiment classifier trained on product reviews is given the sentence: 'I wouldn't say this is anything less than remarkable.' It predicts negative sentiment. What explains this error?
AThe training data lacked enough examples of double negations for the model to learn them
BBag-of-words discards word order, so 'wouldn't' and 'less' register as negative signals without any representation of how they combine to negate each other
CThe word 'remarkable' was not in the training vocabulary, so the model defaulted to negative
DThe sentence is genuinely ambiguous and the classifier correctly flagged uncertainty as negative
The sentence means 'this is remarkable' — two negations ('wouldn't say... less than') produce a positive sentiment. A bag-of-words model sees features like 'wouldn't' and 'less' (which carry negative signal in training data) and 'remarkable' (positive), but with no information about order or structure, it cannot compute how 'wouldn't... less than' inverts the word 'remarkable.' This is not a data problem; it is a fundamental limitation of ignoring word order. Sequential or attention-based models learn that negation words modify what follows them.
Question 2 Multiple Choice
A restaurant review reads: 'The pasta was divine, but the 45-minute wait and rude server ruined the evening.' A single-score document-level classifier assigns it 0.55 (mildly positive). What does this reveal about the classifier's limitation?
AThe classifier needs more training data, since mildly positive is clearly wrong for this review
BSingle-score classification cannot distinguish that food sentiment and service sentiment are different aspects requiring separate targets — a task requiring aspect-based sentiment analysis
CTransformer-based models would also fail on this sentence because of the contrastive conjunction 'but'
DThe classifier is interpreting the review correctly; 'divine pasta' outweighs the service complaints
The review expresses strongly positive sentiment about the food and strongly negative sentiment about the service. A single document-level score collapses these into one number, losing the critical distinction. Aspect-based sentiment analysis (ABSA) identifies target entities (pasta, wait time, server) and assigns separate sentiment labels to each. This is not a training-data problem — it is a structural limitation of document-level models that output one label per text.
Question 3 True / False
Transformer-based sentiment models outperform bag-of-words models on sentences with negation because attention mechanisms allow them to learn how words modify each other's meaning within a sentence.
TTrue
FFalse
Answer: True
Transformers process all words simultaneously and learn attention weights that capture relationships between tokens. In 'not good,' the attention mechanism learns that 'not' is closely related to 'good' and modifies its representation. The model can learn that 'not + [positive word]' maps to negative sentiment. Bag-of-words models cannot represent this relationship because they treat each word as an independent feature, discarding all positional and structural information.
Question 4 True / False
A bag-of-words model that correctly identifies strong sentiment-bearing words ('excellent,' 'awful') will reliably classify sentences containing those words, because individual word polarity is the primary determinant of sentence sentiment.
TTrue
FFalse
Answer: False
Sentence sentiment depends on the compositional structure of the sentence, not just the polarity of individual words. Negation ('not awful' = positive), irony ('what an excellent idea' said sarcastically), qualification ('it was somewhat excellent but mostly mediocre'), and aspect targeting all change how individual word polarity contributes to sentence-level sentiment. Bag-of-words models succeed in simple cases but systematically fail wherever context determines how words modify each other — which is common in real language use.
Question 5 Short Answer
Why do bag-of-words models fail on negated phrases like 'not bad,' and what property of LSTM or transformer architectures allows them to handle negation correctly?
Think about your answer, then reveal below.
Model answer: Bag-of-words models discard word order, representing 'not bad' and 'bad' with nearly identical feature vectors — both contain the feature 'bad' as a negative signal. The model cannot represent the semantic effect of 'not' reversing 'bad.' LSTMs process the sequence left to right, updating a hidden state as each word is read; the LSTM gate mechanism learns to modify the representation when a negation word like 'not' is encountered, so the subsequent word 'bad' is interpreted in a negated context. Transformers use bidirectional attention, learning that in 'not bad,' 'bad' attends strongly to 'not' and should have its sentiment flipped. Both architectures can represent the compositional structure that gives negation its semantic force.
The core issue is that meaning in language is compositional — the meaning of a phrase is a function of the meanings of its parts AND how they are structurally combined. Bag-of-words captures the parts but ignores the structure; sequential and attention-based models capture both.