The Poisson distribution has sufficient statistic T(x) = x and log-partition function A(λ) = λ. A researcher wants to compute E[X] without performing a separate integral. What does the exponential family framework tell them?
AE[X] cannot be computed from A alone — a separate moment calculation is always required
BE[X] = A(λ) = λ, read directly from the log-partition function
CE[X] = A'(η), the first derivative of the log-partition function with respect to the natural parameter
DE[X] = A''(η), the second derivative, because moments are always second-order
For exponential family distributions, E[T(X)] = A'(η), where η is the natural parameter and A is the log-partition function. Since T(x) = x for the Poisson, E[X] = A'(η) = A'(log λ) = λ — the mean is the first derivative of A, confirmed by elementary calculation. The key insight is that you never need to compute ∫ x · e^{-λ} λˣ/x! dx directly; A encodes all moment information through its derivatives. The variance is similarly Var[X] = A''(η).
Question 2 Multiple Choice
A statistician notices that updating a Bayesian model with exponential family data reduces to simple arithmetic on hyperparameters. Why does this conjugate structure arise?
ABecause all probability distributions have conjugate priors if you choose the right parameterization
BBecause the exponential family form makes the likelihood and prior have the same functional structure, so the posterior stays in the same family with updated hyperparameters
CBecause Bayesian updating always reduces to adding the sample mean to the prior mean
DBecause conjugacy is a property of the data, not the distributional form
The conjugate structure is a direct consequence of the exponential family form, not a coincidence. When the likelihood has the form exp{η·T(x) − A(η)} and the prior has the form exp{χ·η − ν·A(η)}, the posterior — obtained by multiplying them together — has the same functional form with χ updated to χ + T(x) and ν updated to ν + 1. The sufficient statistic simply adds to the hyperparameter. This works because the prior was specifically engineered to absorb the likelihood's structure. Outside the exponential family, this arithmetic convenience breaks down.
Question 3 True / False
For any member of the exponential family, the mean and variance of the sufficient statistic T(X) can both be computed from derivatives of the log-partition function A(η) alone.
TTrue
FFalse
Answer: True
This is the central computational payoff of the exponential family framework: E[T(X)] = A'(η) and Var[T(X)] = A''(η). Instead of performing separate integrals for each distribution, you differentiate one function. For the Gaussian, Binomial, Poisson, and Gamma distributions — all exponential family members — moments come from differentiating the appropriate A(η). This is not a coincidence; it follows from the fact that A(η) = log ∫ h(x) exp{η·T(x)} dx, and differentiation under the integral sign yields the moment-generating structure.
Question 4 True / False
The exponential family is a convenient notation for writing distributions, but it does not reveal any genuinely new statistical properties — the same results can be derived independently for each distribution.
TTrue
FFalse
Answer: False
The unified structure genuinely reveals properties that are not obvious from distribution-by-distribution analysis. Most importantly, it explains *why* conjugate priors exist for exactly this class of distributions and not others — the structure is the reason, not an accident. It also explains why the sufficient statistic T(x) is sufficient: the exponential family form is precisely the condition under which the factorization theorem for sufficiency is satisfied. These connections are invisible when you treat each distribution individually; they become clear only when the shared structure is made explicit.
Question 5 Short Answer
In your own words, explain why the log-partition function A(θ) plays a central role in the exponential family, and what would be lost if we did not have a name and formula for it.
Think about your answer, then reveal below.
Model answer: A(θ) is the normalizing term that makes the density integrate to 1, but its significance goes beyond normalization. Because A encodes how probability mass must be distributed for the distribution to be valid, its derivatives encode the moments of T(X): E[T(X)] = A'(η) and Var[T(X)] = A''(η). Without identifying A as a named object, you would have to re-derive moments separately for every distribution. A also appears in the conjugate prior structure: the prior absorbs A(η) with its own coefficient, and Bayesian updating is arithmetic because A appears in both likelihood and prior with the same form.
The key insight is that A is not just a bookkeeping term — it is the function that concentrates all moment information for the distribution. The log-partition function is the bridge between the abstract exponential family structure and the practical statistical operations (estimation, Bayesian updating) that practitioners actually perform. Naming it and computing its derivatives is what makes the exponential family framework useful rather than merely notational.