In the E-step of EM applied to a Gaussian mixture model, what is computed?
ANew cluster means and covariances that maximize the log-likelihood
BThe posterior probability that each data point was generated by each mixture component
CThe marginal log-likelihood of the observed data under the current parameters
DA hard cluster assignment for each data point
The E-step computes the 'responsibilities' — the posterior probabilities r_{ik} = P(component k | data point i, current parameters) — for every data point and every component. These soft assignments say 'how much' each Gaussian component is responsible for each observation. The M-step then uses these responsibilities as weights to update the means, covariances, and mixing proportions. EM never makes hard assignments (that would be k-means, not EM); the soft probabilistic assignment is the key to its smooth convergence.
Question 2 True / False
EM is expected to converge to the global maximum of the likelihood function.
TTrue
FFalse
Answer: False
EM guarantees only that the observed-data log-likelihood is non-decreasing at each iteration — it never goes down. However, it may converge to a local maximum, a saddle point, or a plateau rather than the global maximum. The algorithm is sensitive to initialization: different starting parameters can lead to different solutions. In mixture models, for instance, bad initializations can lead to degenerate solutions where one component collapses onto a single data point. Running EM multiple times from different starting points and taking the best solution is standard practice.
Question 3 Short Answer
Why does the EM algorithm guarantee that the observed-data log-likelihood never decreases between iterations?
Think about your answer, then reveal below.
Model answer: Each E-step constructs a lower bound on the log-likelihood (the ELBO) that is tight at the current parameters; each M-step maximizes this lower bound, moving to parameters with at least as high a likelihood as before.
EM implicitly optimizes a lower bound called the Evidence Lower Bound (ELBO), derived via Jensen's inequality applied to the concave log function. The E-step finds the distribution over latent variables q(z) that makes the ELBO exactly equal to the log-likelihood at the current θ (tightening the bound). The M-step maximizes the ELBO over θ, which increases the lower bound. Because the E-step had made the bound tight, the new log-likelihood at the updated θ is at least as large as at the previous θ. This ascent guarantee does not require reaching a global maximum — it holds at every step.