Why doesn't a high AUC alone tell you which threshold to use clinically, and what additional information is needed to choose the operating point?
Think about your answer, then reveal below.
Model answer: AUC summarizes overall discriminative ability across all possible thresholds but does not encode the relative consequences of errors. Choosing the clinical operating threshold requires knowing the relative costs of false positives (labeling a healthy person as sick) and false negatives (missing a true case), which vary enormously by clinical context. For a lethal but treatable cancer where early detection is life-saving and biopsy is low-risk, a high-sensitivity threshold is appropriate even at the cost of more false positives. For a condition where false positives trigger harmful or expensive interventions, a high-specificity threshold is preferred even if some true cases are missed. These tradeoffs depend on disease prevalence, treatment risk, downstream test costs, and patient values — none of which are encoded in the ROC curve itself.
Formally, the optimal threshold maximizes a utility function that weights sensitivity and specificity by the relative costs and benefits of correct and incorrect classifications. This utility function must be specified from outside the test — it represents clinical and patient-specific values. The ROC curve maps what is technically achievable (the tradeoff frontier between sensitivity and specificity), but the choice of where to operate on that frontier requires specifying the objective. This is why ROC analysis is most powerful at the test development and comparison stage, while threshold selection requires additional decision-analytic reasoning.