The odds ratio (OR) compares the odds of exposure among those with disease to the odds of exposure among those without disease. In case-control studies, OR estimates relative risk when disease is rare (< 10%). OR = (a×d) / (b×c) in a 2×2 table, and values follow the same interpretation pattern as RR (OR > 1 indicates increased association).
Use case-control data and construct 2×2 tables, calculating OR by hand. Compare calculated ORs with RRs from cohort studies of the same exposure-disease pairs to understand when OR approximates RR.
OR ≠ RR even when approximating; an OR of 3 does not mean 3 times more disease (the ratio of odds, not risks); OR interpretation depends on study design—confusing case-control with cohort designs leads to misinterpretation.
From your prerequisite on measures of association, you know what relative risk (RR) means: the ratio of incidence in the exposed group to incidence in the unexposed group. From your study of epidemiologic study designs, you know that case-control studies work differently from cohort studies — instead of following exposed and unexposed people forward to see who develops disease, you identify people who already have the disease (cases) and people who do not (controls), then look backwards to compare their exposure histories. This design difference is exactly why the odds ratio exists: you cannot directly calculate incidence in a case-control study, because the sampling is by disease status, not by exposure status.
The odds ratio is the measure of association available in case-control studies. Consider the standard 2×2 table: cases are in one column, controls in the other; exposed are in one row, unexposed in the other. The cells are labeled a (exposed cases), b (exposed controls), c (unexposed cases), d (unexposed controls). The OR is calculated as (a × d) / (b × c). Intuitively, this is the odds of exposure among cases (a/c) divided by the odds of exposure among controls (b/d). An OR of 2.5 means that cases had 2.5 times higher odds of having been exposed than controls had. Like RR, an OR of 1 indicates no association, greater than 1 indicates a positive association (exposure more common among cases), and less than 1 indicates a negative association (exposure protective).
The critical interpretive link to RR is the rare disease assumption. When disease prevalence is less than roughly 10%, the OR closely approximates the RR numerically. The mathematical reason is that when disease is rare, the c and d cells in the table (unexposed cases and unexposed controls) are small relative to the totals, and the OR formula converges on the RR formula. Practically, this means you can report an OR from a case-control study of a rare cancer and interpret it almost like a relative risk. But when disease is common — say, a cross-sectional study of hypertension — the OR will be meaningfully larger than the RR, and treating them as equivalent overstates the association. An OR of 3 for a common outcome does not mean the exposed group has three times the risk; the actual relative risk is lower.
Logistic regression produces ORs naturally, which is why ORs appear throughout the epidemiologic literature even in studies that are not explicitly case-control designs. When you run logistic regression on any binary outcome, the exponentiated coefficients are ORs. This is convenient statistically, but it reinforces the need to be careful about the rare-disease approximation. In studies of common outcomes with logistic regression, methods like log-binomial regression or Poisson regression with robust variance should be used to estimate RRs directly. Understanding the OR as a measure that is sometimes a good proxy for RR — and knowing when that approximation breaks down — is what separates careful epidemiologic thinking from mechanical formula application.