You fit a logit model using MLE and later discover the error term is not logistically distributed. What is the most likely consequence?
AThe estimates are unaffected because MLE is always consistent
BOnly the standard errors are biased, but the coefficient estimates remain consistent
CThe estimates may be inconsistent because MLE consistency depends on correct distributional specification
DThe estimates are biased in small samples only, but consistent asymptotically regardless of the distribution
MLE consistency requires that the model is correctly specified — the true data-generating process matches the assumed distribution. If the distributional assumption is wrong, the likelihood being maximized does not correspond to the true probability of the data, and MLE may converge to the wrong parameter values (inconsistency). This contrasts with OLS for linear regression, where consistency only requires E[u|x]=0, not a specific distributional form.
Question 2 True / False
When the error term is normally distributed, OLS and MLE produce identical coefficient estimates for a linear regression model.
TTrue
FFalse
Answer: True
Under normality, maximizing the log-likelihood for a linear regression model is algebraically equivalent to minimizing the sum of squared residuals. Both procedures yield the same formula: β̂ = (X'X)⁻¹X'y. This equivalence is specific to the normal linear model — in non-linear models (like logit) or non-normal distributions, MLE and OLS diverge. The key implication is that OLS can be interpreted as MLE under a normality assumption.
Question 3 Short Answer
Why do econometricians typically maximize the log-likelihood rather than the likelihood function itself?
Think about your answer, then reveal below.
Model answer: The log-likelihood is computationally and mathematically more tractable: it converts the product of n probability densities into a sum, which is easier to differentiate and optimize. Because the logarithm is a monotone transformation, the parameter values that maximize the likelihood also maximize the log-likelihood.
For n independent observations, the likelihood is a product: L(θ) = ∏ f(yᵢ; θ). Multiplying many probabilities (each between 0 and 1) quickly produces numbers too small for computers to represent accurately (numerical underflow). Taking the log converts multiplication to addition: ℓ(θ) = Σ log f(yᵢ; θ). Since log is strictly monotone increasing, the argmax is unchanged. The resulting sum is also much easier to differentiate to find the maximum analytically or numerically.