Two classes both have a mean test score of 70. Without any other information, a teacher concludes they 'performed identically.' What critical information is the teacher ignoring?
AThe mode and number of students in each class
BThe spread of scores — two classes can share the same mean but differ drastically in variation, skewness, and outlier behavior
CThe median, which must always be reported alongside the mean
DThe sample sizes, which would affect whether the mean is statistically valid
Two datasets can share the same mean while being completely different in character. Class A might have all students scoring between 65 and 75 (tight, small standard deviation), while Class B has half the class above 90 and half below 50 (wide spread, large standard deviation). Reporting only the mean treats these as identical when they call for entirely different responses. Center and spread must always be reported together.
Question 2 Multiple Choice
A dataset of household incomes in a city has a few extremely wealthy households pulling the mean well above the median. Which statistics should you use to describe the center and spread?
AMean and standard deviation — they make full use of all numerical values in the dataset
BMedian and IQR — they are robust to the influence of extreme values in skewed distributions
CMode and range — these are entirely unaffected by outliers
DMean and IQR — the mean for center and IQR for spread to balance the two approaches
When a distribution is right-skewed — a few unusually high values pull the mean above the median — the mean and standard deviation are distorted by the tail. The median and IQR describe the middle of the distribution without that distortion. For skewed household incomes, the median income tells you what the 'typical' household earns; the mean is inflated by billionaires. The rule: use mean and SD for symmetric distributions; use median and IQR for skewed data or data with extreme outliers.
Question 3 True / False
The median and IQR are preferred over the mean and standard deviation for summarizing a strongly right-skewed distribution, because the mean and SD are disproportionately influenced by extreme values.
TTrue
FFalse
Answer: True
True. In a right-skewed distribution, the mean is pulled toward the long tail by extreme high values, making it unrepresentative of where most data falls. The standard deviation is likewise inflated by those extremes. The median and IQR, by contrast, depend only on rank ordering and the middle 50% of the data — they are not sensitive to how extreme the extremes are. Choosing statistics that match the shape of your data is one of the most practically important skills in descriptive statistics.
Question 4 True / False
Two datasets that share the same mean, median, and standard deviation should have distributions with the same shape.
TTrue
FFalse
Answer: False
False. Anscombe's quartet famously demonstrates that four datasets can share nearly identical summary statistics (mean, variance, correlation) while having radically different shapes — a linear relationship, a curve, a near-perfect line with one outlier, and a cluster with a single extreme point. Summary statistics can hide the real structure of data. This is why graphs are not optional extras — they are essential complements to numerical summaries, capable of revealing patterns that numbers alone conceal.
Question 5 Short Answer
Why is it insufficient to report only the mean when describing a dataset? What does the spread tell you that the mean cannot?
Think about your answer, then reveal below.
Model answer: The mean tells you where the center of the data is, but nothing about how the data is distributed around that center. Two datasets with the same mean can have completely different amounts of variation — one tightly clustered, one widely dispersed. The spread (whether standard deviation or IQR) tells you how much individual values typically deviate from the center: a small spread means values are consistent, a large spread means values are highly variable. Without spread, you cannot assess reliability, risk, or the practical significance of the center value.
This is the core synthesis insight: a single statistic never tells the whole story. A mean of 70 on a test is compatible with every student scoring between 68 and 72, or with scores ranging from 20 to 100. These situations call for different responses, but the mean alone cannot distinguish them. Effective statistical description always pairs a measure of center with a measure of spread — and backs both up with a graph.