Questions: Chemometrics and Multivariate Data Analysis

5 questions to test your understanding

Score: 0 / 5
Question 1 Multiple Choice

A pharmaceutical analyst wants to predict tablet potency from near-IR spectra, but ordinary multiple linear regression fails when they include all 1500 spectral variables. Why does PLS regression succeed where OLS fails here?

APLS uses a larger training dataset than OLS requires
BPLS handles collinear variables by first compressing the spectrum into a small number of latent variables that capture the relevant spectral variation, whereas OLS breaks down when predictor variables are highly correlated with each other
CPLS automatically removes irrelevant wavelengths, leaving only the peak wavelengths for regression
DPLS is more accurate than OLS for any regression problem involving more than 100 variables
Question 2 Multiple Choice

In a PCA of UV-Vis spectra from 80 wine samples, the first two principal components explain 92% of the total variance. What do these principal components represent chemically?

AThe two wavelengths with the highest average absorbance in the dataset
BThe two wavelengths that best distinguish wine varieties from each other
COrthogonal directions in the high-dimensional spectral space that capture the greatest sources of systematic variation across samples — likely reflecting major chemical differences such as pigment concentration or pH
DThe mean spectrum and its standard deviation across all samples
Question 3 True / False

In chemometrics, including more spectral variables (wavelengths) in a calibration model typically improves its predictive accuracy on new samples.

TTrue
FFalse
Question 4 True / False

PCA finds principal components that are eigenvectors of the covariance matrix of the data, ordered by decreasing eigenvalue, where each eigenvalue represents the variance explained by that component.

TTrue
FFalse
Question 5 Short Answer

Explain the key insight behind applying PCA to chemical spectral data, and why two or three components often capture most of the information in spectra with thousands of variables.

Think about your answer, then reveal below.