Questions: Count Data Models: Poisson and Negative Binomial Regression
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A researcher fits a Poisson regression to the number of hospital visits per patient and gets highly significant coefficients. A reviewer suspects the results may be invalid. What should the reviewer check first?
AWhether the log-likelihood is maximized at the estimated parameters
BWhether the data is overdispersed — if variance exceeds the mean, Poisson standard errors will be too small and significance will be inflated
CWhether the outcome variable has any zero values, which Poisson cannot handle
DWhether the coefficients are positive, since count outcomes cannot decrease
The Poisson model imposes equidispersion (mean = variance). Real count data is almost always overdispersed — a small, high-utilization subgroup inflates the variance far above the mean. When the Poisson model is fit to overdispersed data, it underestimates standard errors and overstates statistical significance. The first diagnostic is to test for overdispersion (e.g., compare mean vs. sample variance, or test whether the negative binomial dispersion parameter α is significantly different from 0).
Question 2 Multiple Choice
What is the key parametric difference between Poisson and negative binomial regression?
ANegative binomial uses a log link while Poisson uses an identity link
BNegative binomial adds a dispersion parameter α that allows variance to exceed the mean; when α = 0 it reduces to Poisson
CNegative binomial models the log of the outcome while Poisson models the outcome directly
DNegative binomial uses ordinary least squares while Poisson uses maximum likelihood
Both models use a log link and maximum likelihood estimation. The essential difference is the dispersion parameter α in the negative binomial. Poisson fixes variance = mean; negative binomial allows variance = mean + α·mean² (NB2 parameterization), with α estimated from the data. When α = 0, the negative binomial collapses to Poisson, which is why you can formally test Poisson vs. negative binomial by testing H₀: α = 0.
Question 3 True / False
In Poisson regression, the exponential link function means the model can predict negative counts for extreme covariate values.
TTrue
FFalse
Answer: False
The exponential link is specifically chosen to guarantee non-negative predictions. The model predicts λ = exp(Xβ), and since exp(·) > 0 for all finite inputs, predicted counts are always positive. This is one of the key advantages of Poisson regression over OLS for count data: OLS can produce nonsensical negative predictions, while the exponential link ensures the prediction is always a valid (non-negative) count.
Question 4 True / False
When Poisson regression is fit to overdispersed count data, the estimated coefficients are biased, making them unreliable even if standard errors were correct.
TTrue
FFalse
Answer: False
The primary problem with Poisson on overdispersed data is invalid standard errors, not biased coefficients. The coefficient estimates themselves are still consistent under overdispersion (this is the QMLE / quasi-Poisson result). What fails is the standard error formula, which assumes mean = variance — so t-statistics and p-values are inflated, but the point estimates remain useful. This is why robust standard errors (sandwich estimator) can fix inference without switching to a different model.
Question 5 Short Answer
Explain why overdispersion is specifically dangerous for inference (not just model fit) when using Poisson regression on real count data.
Think about your answer, then reveal below.
Model answer: Poisson regression's standard error formula is derived assuming variance = mean. When actual variance exceeds the mean (overdispersion), the model 'sees' less spread in the data than is truly there, and estimates standard errors that are too small. This makes t-statistics and z-statistics artificially large and p-values artificially small — so coefficients appear statistically significant when they may not be. The danger is incorrect inference: you may publish results claiming strong, reliable associations when the apparent precision is an artifact of the misspecified error structure.
The fix is either to switch to negative binomial regression (which models overdispersion explicitly), use quasi-Poisson with sandwich standard errors, or test for overdispersion before finalizing results. This is one of the most common sources of false positives in applied research using count outcomes.