Questions — Dimensionality Reduction Techniques

Question 1 Multiple Choice

A data scientist wants to reduce 500-dimensional gene expression data to 10 dimensions as input features for a supervised classifier. She runs t-SNE to produce a 10-dimensional embedding and uses those coordinates as features. What is the fundamental problem with this approach?

At-SNE cannot handle more than 100 dimensions, so it will fail on 500-dimensional input

Bt-SNE is non-parametric: it cannot project new data points, so the test set cannot be embedded using the training embedding

Ct-SNE preserves too much global structure, making it unsuitable for classification tasks

D10 dimensions is too many for t-SNE, which only works for 2D or 3D output

Question 2 Multiple Choice

You have high-dimensional data that you suspect lies on a curved, Swiss-roll-shaped manifold. You want to understand the cluster structure for a research presentation. Which method is most appropriate?

APCA, because it is interpretable and invertible

BICA, because it finds statistically independent components rather than just uncorrelated ones

Ct-SNE or UMAP, because they capture nonlinear manifold structure and reveal cluster geometry

DAn autoencoder, because it is the only parametric nonlinear method

Question 3 True / False

Unlike t-SNE and UMAP, a trained autoencoder encoder network can project new, unseen data points into the latent space.

TTrue

FFalse

Question 4 True / False

PCA is generally the best dimensionality reduction method for revealing complex cluster structure in high-dimensional biological data, because it is fast, interpretable, and widely used.

TTrue

FFalse

Question 5 Short Answer

Why can the axes in a t-SNE or UMAP embedding not be meaningfully interpreted or compared across runs, even when the overall cluster structure looks similar?

Think about your answer, then reveal below.

Questions: Dimensionality Reduction Techniques