Questions: Count Data Regression: Poisson and Negative Binomial Models
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A researcher models the number of political protests per country per year using Poisson regression. Diagnostics reveal the variance is 18 times the mean. What is the most likely consequence of ignoring this?
APredicted counts will occasionally be negative
BThe model will automatically compensate by widening confidence intervals
CStandard errors will be underestimated, making predictors appear statistically significant when they may not be
DThe log-link function will produce biased coefficient estimates
Overdispersion means the Poisson model is misspecified — it assumes variance equals the mean, but the actual variance is far larger. When this constraint is violated, Poisson underestimates standard errors. Underestimated SEs produce inflated z-statistics and artificially small p-values, leading researchers to declare spurious significance. Negative binomial regression adds a dispersion parameter that absorbs the extra-Poisson variation, producing correctly estimated standard errors and reliable inference.
Question 2 Multiple Choice
What is the defining characteristic of overdispersion in count data?
AThe outcome variable contains a large number of zero values
BThe variance of the count variable substantially exceeds its mean
CThe count distribution is negatively skewed
DThe mean of the count variable exceeds its variance
Poisson regression's core constraint is mean = variance. Overdispersion is specifically defined as variance > mean — the data is more variable than the Poisson distribution can accommodate. This arises from clustering, contagion processes, or unobserved heterogeneity across units. Excess zeros (option A) are a related but distinct problem handled by zero-inflated models; they don't define overdispersion on their own. Underdispersion (mean > variance) exists but is rare in social science count data.
Question 3 True / False
Negative binomial regression is generally preferred over Poisson regression when overdispersion tests indicate that the variance of the count outcome significantly exceeds the mean.
TTrue
FFalse
Answer: True
This is the primary model selection criterion for count data. Negative binomial adds a dispersion parameter (sometimes called α) to the Poisson model — conceptually, each observation gets its own underlying rate drawn from a gamma distribution, and the mixture produces the negative binomial. When overdispersion is present, this extra parameter absorbs the excess variation and produces correctly calibrated standard errors. In practice, social science count data is almost always overdispersed, making negative binomial the default preference unless a test confirms Poisson adequacy.
Question 4 True / False
Zero-inflated count models are appropriate whenever the count outcome variable contains any zero values.
TTrue
FFalse
Answer: False
Zero-inflated models are specifically for EXCESS zeros — more zeros than Poisson or negative binomial would predict given the estimated rate. Many genuine count processes produce zeros naturally (a country might have zero protests in a quiet year), and standard Poisson or negative binomial handles these fine. Zero-inflated models are appropriate when two distinct data-generating processes are at work: one that determines whether any events can occur at all (a structural zero mechanism) and one that determines how many occur when they do. Diagnostic tools like rootograms and the Vuong test help distinguish excess zeros from ordinary count variation.
Question 5 Short Answer
Why does fitting a Poisson model to overdispersed count data produce unreliable hypothesis tests, and what does negative binomial regression do differently to address this problem?
Think about your answer, then reveal below.
Model answer: Poisson constrains variance to equal the mean. When actual variance exceeds this, the model underestimates standard errors — inflating test statistics and producing false significance. Negative binomial adds a dispersion parameter that lets variance exceed the mean by an estimated amount, absorbing the extra variation and producing correctly sized standard errors.
The core issue is model misspecification: Poisson's mean-variance constraint is a strong assumption that count data routinely violates. When the assumption fails, the model's uncertainty estimates are wrong — not just imprecise, but systematically too small. This makes the problem invisible to the researcher: estimates look precise and significant, but the precision is an artifact of the wrong model. Negative binomial's dispersion parameter is estimated from the data and adjusts the variance-mean relationship accordingly. AIC/BIC comparison and likelihood ratio tests comparing Poisson vs. negative binomial are standard diagnostic steps before reporting results.