Logit and Probit Models for Binary Outcomes

Graduate Depth 76 in the knowledge graph I know this Set as goal
Unlocks 11 downstream topics
logit probit binary-outcome MLE marginal-effects

Core Idea

When the dependent variable is binary (y ∈ {0,1}), the linear probability model (OLS on a dummy) can predict probabilities outside [0,1] and has heteroskedastic errors by construction. Logit and probit models instead model P(y=1|x) = F(x'β) where F is the logistic function (logit) or the standard normal CDF (probit), ensuring predicted probabilities lie in (0,1). Both are estimated by maximum likelihood, not OLS. Coefficients are not directly interpretable as marginal effects; marginal effects (dP/dx evaluated at the mean or averaged over the sample) are reported instead. Logit and probit produce similar results in practice; the choice is usually conventional.

How It's Best Learned

Estimate a labor force participation model (binary) using LPM, logit, and probit on the same data. Compare predicted probabilities near 0 and 1 to see where LPM fails. Compute average marginal effects for the logit model.

Common Misconceptions

Explainer

You already know how OLS regression models E[Y|X] as a linear function of the predictors. When Y is continuous, this works well. When Y is binary — someone either has a job or doesn't, a firm defaults or doesn't, a patient survives or doesn't — OLS produces the linear probability model (LPM), which models P(Y=1|X) directly as X'β. The problem is that a linear function has no natural boundaries: it can predict probabilities below 0 or above 1 for extreme values of X, and its constant marginal effects ignore the fact that it is much easier to shift probability near the middle of the distribution (around 0.5) than near the extremes. The LPM also has errors that are heteroskedastic by construction — since Y can only take two values, the variance of the error is p(1-p), which varies with X.

The solution is to squeeze the linear index X'β through a function that maps the entire real line into (0,1). The logistic function F(z) = 1/(1+e^{-z}) does this: it outputs values strictly between 0 and 1, is symmetric around 0.5, approaches 1 asymptotically for large positive z, and 0 for large negative z. This gives the logit model: P(Y=1|X) = 1/(1+e^{-X'β}). The probit model uses the standard normal CDF Φ(X'β) instead, which has the same shape — both produce an S-curve, and in practice they give nearly identical fitted values. The choice between them is mostly conventional; economists often prefer probit, biostatisticians logit.

Because these models are nonlinear, you cannot use OLS to estimate them. Instead, you maximize the log-likelihood: for each observation, the model predicts a probability pᵢ = F(X'ᵢβ), and the likelihood contribution is pᵢ if Yᵢ=1 or (1−pᵢ) if Yᵢ=0. Maximizing the sum of log contributions finds the β that makes the observed data most probable under the model. The resulting estimator is consistent and asymptotically normal, so standard errors and hypothesis tests work in the usual way.

The trickiest part is interpreting the coefficients. A logit coefficient β_j does not mean "a one-unit increase in Xⱼ raises P(Y=1) by β_j." It means a one-unit increase in Xⱼ raises the log-odds — log(p/(1-p)) — by β_j. Log-odds are not intuitive. To get something interpretable, you compute marginal effects: dP/dXⱼ = F'(X'β) × βⱼ, where F' is the derivative of the link function. Because F' depends on X, the marginal effect varies across observations. Standard practice is to report either the marginal effect at the mean (evaluate at the average X) or the average marginal effect (compute for each observation and average). These give the actual probability change associated with a unit increase in Xⱼ, and are the quantities to report in applied work.

An important distinction from OLS: the logit model's coefficients and marginal effects are not separately identified. Coefficients can only be interpreted relative to the scale of the index X'β, which is fixed by the distributional assumption (logistic or normal). This is why you cannot directly compare the magnitude of logit coefficients across different samples or models that include different variables — the scale changes. You can compare signs and significance, and you can compare marginal effects, but not raw coefficient magnitudes between models.

Practice Questions 5 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesAngle Pairs: Complementary, Supplementary, and VerticalParallel Lines and TransversalsCorresponding AnglesAlternate Interior AnglesTriangle Angle Sum TheoremExterior Angle TheoremTriangle Inequality TheoremSimilar Triangles: AA SimilaritySimilar Triangles: SSS and SAS SimilarityProportions in Similar TrianglesRight Triangle Trigonometry IntroductionTrigonometric Ratios ReviewRadian MeasureConverting Between Degrees and RadiansThe Unit CircleGraphing Sine and CosineGraphing Tangent and Reciprocal Trigonometric FunctionsDerivatives of Trigonometric FunctionsAntiderivativesIndefinite IntegralsBasic Integration RulesRiemann SumsDefinite Integral DefinitionProbability Density Functions and Continuous DistributionsCumulative Distribution FunctionsContinuous Random VariablesNormal DistributionClassical OLS Assumptions (Gauss-Markov)Multiple RegressionLogit and Probit Models for Binary Outcomes

Longest path: 77 steps · 461 total prerequisite topics

Prerequisites (4)

Leads To (3)