A logistic regression model of diabetes risk includes BMI as a predictor and yields a coefficient of 0.08 for BMI. What is the correct interpretation?
AEach 1-unit increase in BMI increases the probability of diabetes by 0.08
BEach 1-unit increase in BMI increases the odds of diabetes by a factor of exp(0.08) ≈ 1.083, or about 8.3%
CEach 1-unit increase in BMI increases the log-probability of diabetes by 0.08
DBMI has a weak association with diabetes because the coefficient is close to zero
In logistic regression, coefficients are on the log-odds scale. The coefficient of 0.08 means each 1-unit increase in BMI raises the log-odds of diabetes by 0.08. Exponentiating gives the odds ratio: exp(0.08) ≈ 1.083, meaning the odds of diabetes increase by about 8.3% per BMI unit. Option A is the most common error — the coefficient is NOT a change in probability because the logit link makes the relationship nonlinear on the probability scale. A coefficient close to zero on the log-odds scale can still represent a substantial effect when the predictor has a wide range (BMI might vary by 30+ units).
Question 2 True / False
Logistic regression can be fit using ordinary least squares (OLS) by treating the binary outcome (0/1) as a continuous variable.
TTrue
FFalse
Answer: False
OLS on a binary outcome (the linear probability model) can produce predicted probabilities outside [0,1], has heteroskedastic errors by construction, and does not properly model the nonlinear relationship between predictors and probability. Logistic regression uses maximum likelihood estimation, which finds the parameter values that maximize the probability of observing the actual data given the model. MLE produces consistent, asymptotically efficient estimates and naturally constrains predicted probabilities to [0,1] through the logistic function.
Question 3 Multiple Choice
A study reports that smokers have an adjusted odds ratio of 3.2 for lung cancer compared to non-smokers. If the baseline probability of lung cancer is 1%, can you approximate the risk ratio from this odds ratio?
ANo — odds ratios and risk ratios are fundamentally different quantities that can never be compared
BYes — when the outcome is rare (1%), the odds ratio closely approximates the risk ratio, so the risk is approximately 3.2 times higher for smokers
CYes — the risk ratio equals the odds ratio divided by 2
DNo — you need the exact number of cases and controls to convert
When the outcome probability is low (conventionally < 10%), the odds ratio closely approximates the risk ratio. This is because odds = p/(1-p) ≈ p when p is small, so the ratio of odds approximates the ratio of probabilities. At 1% baseline probability, OR = 3.2 means the risk for smokers is approximately 3.2% — very close to what a risk ratio of 3.2 would imply. This rare-disease approximation breaks down for common outcomes, where odds ratios substantially overestimate risk ratios.
Question 4 Short Answer
Why does logistic regression use the logit (log-odds) link function rather than modeling probability directly as a linear function of predictors?
Think about your answer, then reveal below.
Model answer: Probabilities are bounded between 0 and 1, but a linear function of predictors is unbounded — it can produce values below 0 or above 1, which are nonsensical as probabilities. The logit function log(p/(1-p)) maps probabilities from [0,1] to (-infinity, +infinity), making linear modeling mathematically valid. The logistic function (the inverse of logit) then maps any linear predictor value back to a valid probability. This also produces a natural interpretation: coefficients represent log-odds ratios, which are additive on the log scale and multiplicative on the odds scale.
The logit link is not arbitrary — it arises naturally from the exponential family and provides the canonical link for Bernoulli distributed outcomes. It also connects logistic regression to case-control study design: because the odds ratio is invariant to outcome-based sampling, logistic regression coefficients estimated from case-control data have the same interpretation as those from cohort data (only the intercept changes). This property makes logistic regression the natural model for case-control studies.