Questions: Classification Metrics and Evaluation

5 questions to test your understanding

Score: 0 / 5
Question 1 Multiple Choice

A fraud detection model evaluated on a dataset where 0.1% of transactions are fraudulent achieves 99.9% accuracy by predicting 'not fraud' for every transaction. What is its recall for the fraud class?

A99.9% — the model is correct nearly all the time
B0% — it never predicts fraud, so it catches none of the actual fraud cases
C0.1% — it correctly identifies the rare fraud cases
DUndefined — recall cannot be computed on imbalanced datasets
Question 2 Multiple Choice

A cancer screening model correctly identifies 95% of actual cancer cases but also flags 40% of healthy patients as potentially cancerous. How should this tradeoff be characterized?

AHigh precision, low recall — the model is conservative and misses few real cases
BLow precision, high recall — the model catches most real cases but generates many false alarms
CHigh precision, high recall — catching 95% of cancers while flagging 40% of healthy patients is acceptable for screening
DLow precision, low recall — a 40% false positive rate means the model is unreliable
Question 3 True / False

F1 score is the arithmetic mean of precision and recall, so it equals (precision + recall) / 2.

TTrue
FFalse
Question 4 True / False

Using macro-averaging to evaluate a multiclass classifier on an imbalanced dataset can make the classifier appear to perform worse than weighted-averaging, even if performance on the majority class is excellent.

TTrue
FFalse
Question 5 Short Answer

A hospital is deploying a classifier to screen patients for a rare but treatable disease. Should the classifier prioritize precision or recall, and why?

Think about your answer, then reveal below.