4 questions to test your understanding
A statistical problem requires n = O(log d) samples to solve information-theoretically but n = O(sqrt(d)) samples for any known polynomial-time algorithm. If someone proves that no polynomial-time algorithm can match the O(log d) bound, what type of result is this?
Computational-statistical tradeoffs can only exist if P ≠ NP.
In sparse PCA, the goal is to find a sparse leading eigenvector of a covariance matrix. The statistical sample complexity is O(k * log(d/k)), but the best known polynomial-time algorithm requires O(k^2) samples. This gap is believed to be inherent.
Explain why computational-statistical tradeoffs are important for machine learning practice, beyond being a purely theoretical concern.