Questions — Chemometrics and Multivariate Data Analysis

Question 1 Multiple Choice

A pharmaceutical analyst wants to predict tablet potency from near-IR spectra, but ordinary multiple linear regression fails when they include all 1500 spectral variables. Why does PLS regression succeed where OLS fails here?

APLS uses a larger training dataset than OLS requires

BPLS handles collinear variables by first compressing the spectrum into a small number of latent variables that capture the relevant spectral variation, whereas OLS breaks down when predictor variables are highly correlated with each other

CPLS automatically removes irrelevant wavelengths, leaving only the peak wavelengths for regression

DPLS is more accurate than OLS for any regression problem involving more than 100 variables

Question 2 Multiple Choice

In a PCA of UV-Vis spectra from 80 wine samples, the first two principal components explain 92% of the total variance. What do these principal components represent chemically?

AThe two wavelengths with the highest average absorbance in the dataset

BThe two wavelengths that best distinguish wine varieties from each other

COrthogonal directions in the high-dimensional spectral space that capture the greatest sources of systematic variation across samples — likely reflecting major chemical differences such as pigment concentration or pH

DThe mean spectrum and its standard deviation across all samples

Question 3 True / False

In chemometrics, including more spectral variables (wavelengths) in a calibration model typically improves its predictive accuracy on new samples.

TTrue

FFalse

Question 4 True / False

PCA finds principal components that are eigenvectors of the covariance matrix of the data, ordered by decreasing eigenvalue, where each eigenvalue represents the variance explained by that component.

TTrue

FFalse

Question 5 Short Answer

Explain the key insight behind applying PCA to chemical spectral data, and why two or three components often capture most of the information in spectra with thousands of variables.

Think about your answer, then reveal below.

Questions: Chemometrics and Multivariate Data Analysis