A loan approval algorithm achieves demographic parity: it approves 40% of applicants from every racial group. However, Group A has a 5% historical default rate and Group B has a 25% default rate. Which statement best describes the fairness situation?
AThe algorithm is fair because it treats every group identically at the point of decision
BThe algorithm satisfies demographic parity but likely violates calibration — among those approved, group B members are far more likely to default
CThe algorithm satisfies equalized odds because approval rates are equal across groups
DThe algorithm has no bias because it does not use race as an input feature
Demographic parity and calibration are in tension here. Calibration requires that among all people given a given risk score, the actual outcome rate matches the score — regardless of group. If Group B's true default rate is 25% but they're approved at the same rate as Group A (5% default rate), the model's scores cannot simultaneously be well-calibrated for both groups. Option A is the common misconception that equal treatment equals fairness. Option C confuses demographic parity with equalized odds. Option D ignores proxy discrimination — features correlated with race can produce disparate impact even without explicit race inputs.
Question 2 Multiple Choice
Why can't an AI classifier simultaneously satisfy demographic parity, equalized odds, and calibration in most real-world settings?
ABecause current computing power is insufficient to optimize all three objectives simultaneously
BBecause these metrics require perfectly balanced training data, which rarely exists
CBecause when base rates differ between groups, satisfying one definition mathematically precludes satisfying the others
DBecause fairness metrics apply to individuals rather than groups, making group-level metrics inherently contradictory
This is Choquet's impossibility result (and related theorems). When two groups have different base rates for the outcome — say, different historical default rates — you cannot equalize approval rates (demographic parity), equalize true and false positive rates (equalized odds), AND have scores mean the same thing across groups (calibration) simultaneously. The math forces a trade-off. This is not a data problem or a computational problem; it is a logical consequence of group base rate differences.
Question 3 True / False
Once an AI system has been deployed with fairness constraints, ongoing monitoring is unnecessary because the fairness properties established at training time persist.
TTrue
FFalse
Answer: False
Distribution shift — changes in the data-generating process over time — can introduce new forms of bias even in systems that were fair at deployment. The population served may change, economic conditions may shift, and correlations between features and outcomes may evolve. Responsible AI practice requires continuous monitoring for disparate impact, not just a one-time audit at training. The EU AI Act and other regulatory frameworks are beginning to codify ongoing monitoring requirements for this reason.
Question 4 True / False
Choosing which fairness definition (demographic parity, equalized odds, calibration, etc.) to optimize for is ultimately an ethical and political decision, not a purely technical one.
TTrue
FFalse
Answer: True
Because fairness definitions are mathematically incompatible in general, prioritizing one requires a value judgment about whose interests matter more and what kind of errors are more acceptable. Demographic parity emphasizes equal treatment at the decision point. Equalized odds emphasizes equal accuracy across groups. Calibration emphasizes that scores mean the same thing for everyone. Which to prioritize depends on the stakes, the domain (criminal justice vs. loan approval vs. medical screening), and contested social values about equity — not on algorithm design alone.
Question 5 Short Answer
Explain why fairness definitions like demographic parity and equalized odds are in tension with each other, and what this means for AI practitioners who want to build 'fair' systems.
Think about your answer, then reveal below.
Model answer: When two groups have different base rates for the predicted outcome (e.g., different actual default rates), equalizing the positive prediction rate across groups (demographic parity) forces the model to approve higher-risk members of one group and reject lower-risk members of another — producing different error rates. Equalized odds demands equal error rates, which then requires different approval rates, violating parity. Calibration requires that the same score means the same risk for both groups, which clashes with forced approval-rate equality. Practitioners must explicitly choose which definition to prioritize based on the context and accept that the choice involves an ethical trade-off, not a technical solution.
This impossibility result means 'build a fair AI' is not a well-posed engineering objective. Practitioners must ask: what kind of fairness, for whom, measured how? Ignoring this leads to false confidence — a system optimized for one metric may be deeply unfair by another equally defensible metric. Making the trade-off explicit and accountable is itself part of responsible AI practice.