Two diagnostic models for predicting heart failure have AUCs of 0.85 and 0.72. A colleague claims the first model is always better for clinical use. What important caveat is missing?
AAUC cannot be compared between models
BAUC measures discrimination across all thresholds, but at the specific clinical threshold used in practice, the model with lower AUC might have better sensitivity or specificity
CThe model with higher AUC is always better at every threshold
DAUC is only valid for binary outcomes, not heart failure severity
AUC is a summary measure that averages performance across all possible thresholds, including many that would never be used clinically. Two ROC curves can cross — Model A may be better at high-sensitivity thresholds while Model B is better at high-specificity thresholds. The model with higher overall AUC may perform worse at the exact clinical decision threshold. Partial AUC (restricted to clinically relevant regions) or threshold-specific sensitivity/specificity may be more informative for clinical decisions.
Question 2 True / False
A prediction model has an AUC of 0.50. This means the model is performing worse than random chance.
TTrue
FFalse
Answer: False
AUC = 0.50 means the model has no discriminative ability — it performs exactly at chance level. It cannot distinguish between cases and controls better than flipping a coin. An AUC below 0.50 would mean the model's predictions are inversely related to the outcome (it systematically assigns higher scores to non-cases), which can be corrected by reversing the prediction direction. AUC = 0.50 is the baseline of no information, not worse than chance.
Question 3 Multiple Choice
A logistic regression model predicting diabetes has an AUC of 0.82 and appears well-discriminating, but a calibration plot shows it systematically overestimates risk — predicting 40% when the actual risk is 20%. Is the model's AUC still valid?
ANo — poor calibration invalidates the AUC
BYes — AUC measures discrimination (ranking), not calibration (absolute probability accuracy); the model correctly ranks high-risk above low-risk even if the absolute probabilities are wrong
CThe AUC should be recalculated after recalibrating the model
DAUC and calibration always agree — a well-discriminating model must be well-calibrated
Discrimination and calibration are independent properties. A model can perfectly rank patients by risk (high AUC) while systematically overestimating or underestimating absolute probabilities (poor calibration). AUC reflects only whether the model assigns higher predicted probabilities to actual cases than to actual non-cases — the ranking. Clinical decisions based on absolute risk thresholds (e.g., 'treat if predicted risk > 10%') require both good discrimination and good calibration.
Question 4 Short Answer
Explain the concordance interpretation of AUC and why it makes AUC intuitive as a measure of discrimination.
Think about your answer, then reveal below.
Model answer: AUC equals the probability that if you randomly select one diseased person and one non-diseased person, the model assigns a higher predicted probability to the diseased person. An AUC of 0.85 means that in 85% of randomly drawn case-control pairs, the model correctly identifies who has the disease. This interpretation makes AUC intuitive because it directly measures what discrimination means: the ability to rank cases above non-cases.
The concordance interpretation connects the geometric area under the curve to a concrete probabilistic statement about pairwise comparisons. It also explains why AUC = 0.5 is the chance baseline: random guessing would correctly rank a random pair 50% of the time. The concordance statistic (C-statistic) generalizes this concept to survival analysis, where it measures the probability that a subject who experiences the event sooner has a higher predicted risk.