Questions — Principal Component Analysis

Question 1 Multiple Choice

You run PCA on a 100-feature dataset. The first 3 principal components explain 82% of total variance. A colleague says 'PCA found the 3 most important features.' What is wrong with this statement?

ANothing is wrong — PCA selects the 3 features with the highest variance

BPCA found 3 new axes that are linear combinations of all 100 original features, not a subset of 3 features

CPCA selects features by correlation, not by variance, so 82% refers to correlation explained

DThe colleague is right, except the number should be higher — PCA typically retains at least 10 features

Question 2 Multiple Choice

A dataset's true structure lies on a two-dimensional Swiss roll (a curved, spiral surface) embedded in three-dimensional space. You apply PCA to reduce to 2 dimensions. What will most likely happen?

APCA will perfectly recover the 2D structure, since the data truly lives in 2 dimensions

BPCA will fail to capture the intrinsic structure because it can only find flat (linear) subspaces, and no flat plane efficiently aligns with a curved manifold

CPCA will fail because it cannot handle 3D data — it only works on high-dimensional datasets

DPCA will succeed if you first normalize the features, since normalization linearizes the structure

Question 3 True / False

The first principal component is the eigenvector of the covariance matrix corresponding to the largest eigenvalue, and it points in the direction of maximum variance in the data.

TTrue

FFalse

Question 4 True / False

PCA removes noise from a dataset by keeping mainly the principal components with large eigenvalues and discarding the rest.

TTrue

FFalse

Question 5 Short Answer

Why must data be centered (mean-subtracted) before applying PCA, and what artifact arises if this step is skipped?

Think about your answer, then reveal below.

Questions: Principal Component Analysis