5 questions to test your understanding
You run PCA on a 100-feature dataset. The first 3 principal components explain 82% of total variance. A colleague says 'PCA found the 3 most important features.' What is wrong with this statement?
A dataset's true structure lies on a two-dimensional Swiss roll (a curved, spiral surface) embedded in three-dimensional space. You apply PCA to reduce to 2 dimensions. What will most likely happen?
The first principal component is the eigenvector of the covariance matrix corresponding to the largest eigenvalue, and it points in the direction of maximum variance in the data.
PCA removes noise from a dataset by keeping mainly the principal components with large eigenvalues and discarding the rest.
Why must data be centered (mean-subtracted) before applying PCA, and what artifact arises if this step is skipped?