Questions: Outlier Detection and Statistical Methods
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
An analyst runs six replicates and notices one result doesn't match her expectations for the sample. She calculates a Dixon's Q statistic after seeing the data and finds it exceeds the critical value at 95% confidence. Is she justified in rejecting the outlier?
AYes — the statistical test confirms the value is anomalous, which is sufficient justification
BNo — rejection criteria must be established before data collection; post-hoc testing alone does not constitute a defensible procedure
CYes — any value exceeding the critical Q can always be removed regardless of when the test is applied
DNo — Dixon's Q is not a recognized test for outlier rejection in analytical chemistry
The statistical test itself is necessary but not sufficient. The key principle is that rejection criteria — which test, which confidence level, what documentation is required — must be specified in the method SOP *before* data are collected. Selecting a test after seeing which value looks suspicious introduces unconscious bias. The test tells you the value is statistically improbable; pre-specified criteria ensure the decision to reject is not influenced by whether the outlier fits your hypothesis.
Question 2 Multiple Choice
A laboratory is performing interlaboratory proficiency testing with 30+ participants, and suspects that several labs may have produced anomalous results. Which outlier detection approach is most appropriate?
ADixon's Q-test, because it is the simplest to calculate
BGrubbs' test, because it works best for any dataset regardless of contamination
CRobust methods (e.g., median absolute deviation), because they resist the influence of multiple outliers on the reference statistics
Dz-score analysis using the dataset mean and standard deviation
When multiple outliers may be present, standard methods like Grubbs' test and z-scores are compromised because they use the mean and standard deviation — statistics that are themselves inflated by the very outliers you are trying to detect. Robust methods like MAD replace these with statistics resistant to extreme values. Dixon's Q is only appropriate for small datasets (n ≤ 25) with a single suspect value.
Question 3 True / False
A statistically identified outlier should generally be excluded from the reported result, since its improbability under the assumed distribution proves it is erroneous.
TTrue
FFalse
Answer: False
Statistical improbability is not the same as being erroneous. An outlier may reflect a genuine extreme variation in the sample, an unknown interference, or a real phenomenon worth investigating. The statistical test provides grounds for exclusion from the *reported* result (with documentation), but the cause must also be investigated. A value from a genuine rare event should be noted, not silently discarded. The test justifies removal; only a laboratory investigation can determine whether the cause represents a systemic problem.
Question 4 True / False
Pre-specifying outlier rejection criteria in a method SOP before any data are collected is a defensible practice requirement, not just a procedural formality.
TTrue
FFalse
Answer: True
This is the central principle of defensible outlier treatment. Specifying criteria in advance prevents the most common form of inadvertent data manipulation: choosing a test, confidence level, or threshold after seeing which value would be eliminated. Regulatory authorities (GLP, FDA, ISO 17025) require documented, pre-specified criteria precisely because post-hoc decisions — even well-intentioned ones — cannot be distinguished from selective data removal.
Question 5 Short Answer
Why is it insufficient to simply run a statistical outlier test when a suspicious measurement appears? What additional step is required, and why does it matter?
Think about your answer, then reveal below.
Model answer: A statistical test establishes that the value is improbable under the assumed distribution, but it cannot reveal the cause. The required additional step is a laboratory investigation to determine whether the outlier resulted from an identifiable error (spill, air bubble, calculation mistake, instrument malfunction) or genuine sample variation. This matters because identifying the cause prevents recurrence of systemic problems. Rejecting an outlier without investigation treats a symptom while leaving the underlying problem intact.
The distinction between 'statistically anomalous' and 'causally explained' is critical. If you find air bubbles in the pipette explain the outlier, you can fix the technique. If no cause is found, the value may need to be retained or flagged rather than removed. The goal of outlier detection is data integrity — not data convenience — and investigation is what separates legitimate rejection from rationalized exclusion.