You are estimating parameters of a Gamma(α, β) distribution, which has two unknown parameters. How should you set up a method of moments estimation?
AUse one equation matching the sample mean to E[X], since the mean alone determines the distribution shape
BUse two equations matching the first and second sample moments to E[X] and E[X²], then solve the 2×2 system for α and β
CUse three or more equations for robustness, choosing the ones with the smallest variance
DUse one equation matching the sample median, which is more robust to outliers than the mean
When you have p unknown parameters, you need p moment equations. For Gamma(α, β) with two unknowns, you set m̂₁ = μ'₁(α,β) and m̂₂ = μ'₂(α,β) and solve the system. Using fewer equations leaves the system underdetermined; using more creates overdetermination (the basis of GMM). The median is not a moment and is not used in the standard method of moments framework.
Question 2 Multiple Choice
For an Exponential(λ) distribution, the method of moments gives λ̂ = 1/X̄, which happens to equal the MLE. A student concludes that MOM always equals MLE. What is the fundamental error in this reasoning?
AMOM and MLE are mathematically identical for all exponential family distributions, so the conclusion is actually correct
BThe agreement is a coincidence specific to the exponential distribution; MOM and MLE generally differ, and MLE uses the full likelihood shape rather than only moment summaries
CMOM is always more efficient than MLE because it uses fewer computational assumptions
DMOM and MLE agree whenever the distribution has a single sufficient statistic
The exponential is a special case where MOM and MLE coincide. For distributions like the Beta, Gamma, or Weibull, MOM and MLE give different estimates, and MLE is typically more efficient because it uses the entire likelihood surface — the full shape of the distribution — rather than just the values of one or two moments. Using only moment summaries discards information, which is precisely why MOM has lower statistical efficiency.
Question 3 True / False
Method of moments estimators are consistent because sample moments converge in probability to their population counterparts as sample size grows.
TTrue
FFalse
Answer: True
True. This is a direct application of the weak law of large numbers: the k-th sample moment m̂_k = (1/n)Σ Xᵢᵏ converges in probability to E[X^k] = μ'_k(θ). If the mapping from population moments to parameters is continuous (invertible), then by the continuous mapping theorem, the MOM estimator θ̂ converges in probability to θ. Consistency is thus essentially built into the method.
Question 4 True / False
Since method of moments uses multiple moment equations to capture more features of the distribution, it is generally more statistically efficient than maximum likelihood estimation.
TTrue
FFalse
Answer: False
False. MLE is generally more efficient because it uses the full likelihood — the entire shape of the density or probability function — which encodes more information than any finite set of moments. The Cramér-Rao lower bound quantifies this: MLE achieves the bound under regularity conditions, while MOM typically does not. Using more moments does not recover the full distributional information; MLE extracts all available information simultaneously through the score function.
Question 5 Short Answer
Why are method of moments estimators consistent, and why are they typically less statistically efficient than maximum likelihood estimators?
Think about your answer, then reveal below.
Model answer: MOM estimators are consistent because they are continuous functions of sample moments, and the WLLN guarantees that sample moments converge in probability to their population counterparts. By the continuous mapping theorem, the MOM estimator converges to the true parameter. They are less efficient than MLE because MOM uses only a finite set of moment summaries (e.g., mean and variance), while MLE uses the full likelihood, which encodes the complete shape of the distribution. Any information in the distributional shape not captured by the chosen moments is lost, increasing estimator variance.
The key contrast is: MOM uses summary statistics (moments) to match features of the distribution; MLE maximizes the probability of the observed data under the model, squeezing out all available information. For distributions where moments determine the distribution well (like the normal), the gap is small. For complex distributions, the gap in efficiency can be substantial.