Questions: Logit and Probit Models for Binary Outcomes
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A logit model of employment predicts a coefficient of 0.5 on years of education. A researcher reports: 'One additional year of education increases the probability of employment by 50 percentage points.' What is wrong?
AThe coefficient should be divided by 100 to convert from log-odds to probability
BThe logit coefficient measures change in log-odds, not probability; marginal effects — which vary across individuals — must be computed separately
CThe interpretation would be correct only if all other variables are held at their means
DThe interpretation is correct for probit but not logit due to their different link functions
A logit coefficient β on variable X means a one-unit increase in X raises the log-odds by β — not the probability. The probability change (the marginal effect) is β × F'(X'β), where F' is the derivative of the logistic function. This varies across observations because F' depends on the value of X'β. A coefficient of 0.5 could imply a marginal effect of 0.01 (near the extremes where the curve is flat) or 0.125 (near the middle where the curve is steepest). The raw coefficient is not interpretable as a probability change regardless of what units it's in.
Question 2 Multiple Choice
Why do logit and probit models replace OLS (the linear probability model) for binary outcomes?
AOLS cannot converge when Y is binary because the design matrix becomes singular
BBinary outcomes have zero variance, so OLS has nothing to explain
COLS can predict probabilities below 0 and above 1, and produces heteroskedastic errors by construction; logit and probit constrain predictions to (0,1)
DOLS requires normally distributed dependent variables, and binary data follow a Bernoulli distribution that violates this assumption
The linear probability model's core problem is geometric: a line extending infinitely in both directions will eventually predict probabilities below 0 or above 1 for sufficiently extreme values of X. It also has built-in heteroskedasticity because Var(Y|X) = p(1-p), which changes with X. Logit and probit squeeze the linear index X'β through an S-shaped link function that maps (−∞, +∞) into (0,1), guaranteeing valid probability predictions. Option D is a common misconception — OLS assumptions concern the errors, not Y itself, and the normality assumption is not strictly required.
Question 3 True / False
Because logit and probit models produce nearly identical fitted values in practice, you can directly compare the magnitudes of their coefficients to determine which model fits better.
TTrue
FFalse
Answer: False
Logit and probit coefficients cannot be compared in magnitude because they are on different scales. The logit model uses the logistic function and the probit model uses the standard normal CDF, which have different variances. Logit coefficients are typically about 1.6–1.8 times larger than probit coefficients for the same data, simply due to the scale difference between the two link functions. To compare model fit, use log-likelihood or information criteria (AIC/BIC), not coefficient magnitudes. Marginal effects from the two models ARE comparable because they are in probability units.
Question 4 True / False
In a logit model, the marginal effect of a predictor variable on P(Y=1) is constant across most observations, analogous to a slope coefficient in linear regression.
TTrue
FFalse
Answer: False
The marginal effect in a logit model is dP/dX = F'(X'β) × β, where F' is the derivative of the logistic function. F' equals p(1-p), which reaches its maximum of 0.25 when p = 0.5 and approaches 0 near the extremes. This means the marginal effect is largest when predicted probability is near 0.5 and nearly zero when probability is near 0 or 1. A predictor that shifts probability from 0.49 to 0.51 has a much larger marginal effect than one shifting probability from 0.01 to 0.03, even if the coefficient is the same. This non-constancy is why marginal effects must be computed — and why 'effect at the mean' and 'average marginal effect' can differ.
Question 5 Short Answer
Why must researchers compute and report marginal effects rather than just reporting the raw logit or probit coefficients? What do the raw coefficients actually measure?
Think about your answer, then reveal below.
Model answer: Raw logit coefficients measure changes in log-odds per unit increase in the predictor — a quantity that is hard to interpret intuitively. Raw probit coefficients measure changes in the standard normal z-score. Neither is directly interpretable as a probability change. Marginal effects convert the coefficient into probability units (the change in P(Y=1) per unit change in X) by multiplying by the derivative of the link function at each observation's values. Because this derivative varies with X'β, the marginal effect differs across individuals, so researchers report either the marginal effect at the mean X or the average marginal effect across all observations.
The deeper issue is that logit and probit models are inherently nonlinear: the same coefficient β implies a larger probability change near p = 0.5 than near p = 0 or 1. Reporting only β hides this nonlinearity and can mislead readers about the practical significance of the predictor. Marginal effects translate the statistical output into policy-relevant terms — 'an additional year of schooling raises the probability of employment by approximately 3 percentage points at the mean' is informative in a way that 'the logit coefficient on schooling is 0.5' is not.