Questions — Classification Metrics and Evaluation

Question 1 Multiple Choice

A fraud detection model evaluated on a dataset where 0.1% of transactions are fraudulent achieves 99.9% accuracy by predicting 'not fraud' for every transaction. What is its recall for the fraud class?

A99.9% — the model is correct nearly all the time

B0% — it never predicts fraud, so it catches none of the actual fraud cases

C0.1% — it correctly identifies the rare fraud cases

DUndefined — recall cannot be computed on imbalanced datasets

Question 2 Multiple Choice

A cancer screening model correctly identifies 95% of actual cancer cases but also flags 40% of healthy patients as potentially cancerous. How should this tradeoff be characterized?

AHigh precision, low recall — the model is conservative and misses few real cases

BLow precision, high recall — the model catches most real cases but generates many false alarms

CHigh precision, high recall — catching 95% of cancers while flagging 40% of healthy patients is acceptable for screening

DLow precision, low recall — a 40% false positive rate means the model is unreliable

Question 3 True / False

F1 score is the arithmetic mean of precision and recall, so it equals (precision + recall) / 2.

TTrue

FFalse

Question 4 True / False

Using macro-averaging to evaluate a multiclass classifier on an imbalanced dataset can make the classifier appear to perform worse than weighted-averaging, even if performance on the majority class is excellent.

TTrue

FFalse

Question 5 Short Answer

A hospital is deploying a classifier to screen patients for a rare but treatable disease. Should the classifier prioritize precision or recall, and why?

Think about your answer, then reveal below.

Questions: Classification Metrics and Evaluation