Questions: Multivariate Calibration: PLS and PCR Models

5 questions to test your understanding

Score: 0 / 5
Question 1 Multiple Choice

A chemist builds both a PCR model and a PLS model for predicting glucose concentration from near-infrared spectra of blood plasma. The plasma also strongly absorbs at wavelengths associated with albumin, which is unrelated to glucose. Which statement best explains why PLS typically achieves better glucose predictions with fewer components?

APLS normalizes the spectra first, removing albumin absorption automatically
BPLS finds latent variables that maximize covariance with glucose concentration, so albumin-related spectral variation is deprioritized
CPCR is mathematically invalid for overlapping spectra, making PLS the only valid choice
DPLS uses more calibration samples than PCR, giving it an inherent accuracy advantage
Question 2 Multiple Choice

During cross-validation of a PLS model, the prediction error decreases as the number of latent variables increases from 1 to 6, reaches a minimum at 6 components, and then begins increasing. What is the best interpretation of this pattern?

AThe true underlying model has exactly 6 independent chemical factors contributing to the signal
BThe model overfits noise when more than 6 components are included, even though training error would continue to fall
CSix components is the mathematical maximum for this dataset, so more cannot be added
DThe cross-validated error increasing after 6 components indicates the calibration samples are outliers
Question 3 True / False

PLS models for spectral data typically require fewer latent variables than PCR models to achieve the same predictive accuracy.

TTrue
FFalse
Question 4 True / False

Ordinary least squares regression (OLS) can be reliably applied to multivariate spectral calibration problems whenever the number of calibration samples exceeds the number of wavelengths measured.

TTrue
FFalse
Question 5 Short Answer

Explain why the number of latent variables (components) in a PLS or PCR model must be determined by cross-validation rather than simply choosing the number that minimizes training error.

Think about your answer, then reveal below.