Measures of spread quantify variability in data. Range = max - min (sensitive to outliers). Interquartile range (IQR) = Q3 - Q1 (robust to outliers). Variance = average squared deviation from mean. Standard deviation = √variance (in original units). These measures answer 'how spread out is the data?' and are essential for understanding data distributions, comparing datasets, and assessing consistency.
Compute all measures for the same dataset. Add outliers and observe which measures change. Plot data showing how visual spread matches numerical measures.
Confusing variance and standard deviation. Thinking IQR includes all data. Believing range alone describes spread adequately.
You already know how to find the center of a dataset — the mean, median, and mode each describe a typical value. But two datasets can have the same mean and look completely different. Imagine one class where every student scores between 70 and 80, and another where scores range from 10 to 100. Both might have a mean of 75. Measures of spread exist to capture this difference.
The simplest measure is the range: maximum minus minimum. It is easy to compute but fragile — one extreme outlier can make a dataset look far more variable than it really is. A single student who scores 0 on an exam inflates the range for the whole class, even if everyone else clustered tightly.
The interquartile range (IQR) solves this by ignoring the extremes. It measures the span of the middle 50% of data: Q3 (the 75th percentile) minus Q1 (the 25th percentile). IQR is robust to outliers because those extreme values lie outside the range it measures. This is why box plots use IQR as their primary spread statistic — it describes where most of the data lives.
Variance takes a different approach: compute each value's distance from the mean, square those distances, and average them. Squaring serves two purposes — it makes all deviations positive, and it penalizes large deviations more heavily than small ones. The downside is that variance is in squared units, which are hard to interpret directly. Standard deviation fixes this by taking the square root of variance, returning the result to the original units. This is why standard deviation is the spread measure you will see most often in practice.
The key judgment call is which measure to use. If your data has outliers or is skewed, IQR is more informative than standard deviation. If your data is roughly symmetric and outlier-free, standard deviation is preferred because it uses all the data and connects cleanly to probability theory. Range is useful as a quick sanity check, but rarely sufficient on its own.