Questions: Text Classification

5 questions to test your understanding

Score: 0 / 5
Question 1 Multiple Choice

A fraud detection dataset contains 99.9% legitimate transactions and 0.1% fraudulent ones. A classifier that always predicts 'not fraud' achieves 99.9% accuracy. What does this reveal?

AThe model is performing well — 99.9% accuracy is excellent for any classification task
BAccuracy is a misleading metric here; the model detects zero fraud while appearing to succeed
CThe dataset must be balanced to 50/50 before any classifier can be trained
DThis problem requires unsupervised learning because labeled fraud examples are too rare
Question 2 Multiple Choice

Why does fine-tuning a pretrained language model like BERT typically require far less labeled training data than training a classifier using TF-IDF features from scratch?

ABERT compresses text more efficiently, so fewer examples are needed to fill its parameter space
BBERT's pretraining has already learned rich language representations, so fine-tuning adapts existing knowledge rather than learning from zero
CTF-IDF classifiers require more data because they use more model parameters
DBERT processes examples more data-efficiently through its attention mechanism
Question 3 True / False

Bag-of-words models discard word order entirely, yet they can still achieve reasonable performance on many text classification tasks such as spam detection and topic classification.

TTrue
FFalse
Question 4 True / False

Preprocessing steps like lowercasing and stop word removal usually improve text classification performance and should be applied universally.

TTrue
FFalse
Question 5 Short Answer

Explain why overall accuracy is an insufficient evaluation metric for a text classifier trained on a severely imbalanced dataset, and what metrics should be used instead.

Think about your answer, then reveal below.