5 questions to test your understanding
A fraud detection model evaluated on a dataset where 0.1% of transactions are fraudulent achieves 99.9% accuracy by predicting 'not fraud' for every transaction. What is its recall for the fraud class?
A cancer screening model correctly identifies 95% of actual cancer cases but also flags 40% of healthy patients as potentially cancerous. How should this tradeoff be characterized?
F1 score is the arithmetic mean of precision and recall, so it equals (precision + recall) / 2.
Using macro-averaging to evaluate a multiclass classifier on an imbalanced dataset can make the classifier appear to perform worse than weighted-averaging, even if performance on the majority class is excellent.
A hospital is deploying a classifier to screen patients for a rare but treatable disease. Should the classifier prioritize precision or recall, and why?