Explain why the distinction between convex and non-convex optimization is the central divide in ML optimization theory, and what guarantees convexity provides.
Think about your answer, then reveal below.
Model answer: Convexity guarantees three critical properties: (1) every local minimum is a global minimum, so gradient-based methods cannot get stuck in suboptimal basins; (2) first-order optimality conditions (gradient = 0) are sufficient for global optimality, making it easy to verify solutions; (3) polynomial-time algorithms with provable convergence rates exist (e.g., GD at O(1/T), accelerated GD at O(1/T^2)). Non-convex problems — including deep learning — have none of these guarantees: there can be many local minima, saddle points, and no polynomial-time algorithm is guaranteed to find the global optimum. This is why the theory for convex ML (SVMs, logistic regression, ridge regression) is mature and tight, while deep learning theory is still an active frontier. The convex/non-convex divide determines whether optimization is a solved problem or an open research question.
In practice, deep networks optimize surprisingly well despite non-convexity — understanding why is one of the major open problems in ML theory. The convex theory provides the baseline against which non-convex phenomena are measured and understood.