Questions: Descriptive Statistics and Data Visualization
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A researcher collects salary data from 500 participants. The distribution is strongly right-skewed — most people earn $40,000–60,000/year, but a small number of executives earn over $500,000. Which summary statistic best represents a 'typical' salary in this dataset?
AThe mean, because it uses all data points and is mathematically optimal
BThe median, because it is robust to the extreme high values that pull the mean upward
CThe standard deviation, because it captures how spread out the salaries are
DThe mode, because the most common salary is the best representation of the typical employee
For right-skewed distributions, the mean is pulled toward the tail by extreme values, making it an overestimate of the 'typical' value. The median — the middle value — is not influenced by how extreme the outliers are, only by their position in the ordered list. In income data, the mean salary might be $85,000 while the median is $52,000; the median is a far better description of what a typical worker earns. Standard deviation measures spread, not center. Mode is appropriate for categorical data.
Question 2 Multiple Choice
A researcher wants to compare the distributions of anxiety scores across three therapy groups and quickly identify outliers in each group. Which visualization is most appropriate?
AA histogram for each group — shows the full shape of each distribution
BSide-by-side boxplots — compresses each distribution into five summary statistics and highlights outliers
CA scatterplot — reveals the relationship between anxiety and group membership
DA bar chart — shows the mean for each group with error bars
Side-by-side boxplots are designed exactly for this use case: comparing multiple group distributions simultaneously while making outliers visible as individual plotted points. Each boxplot shows the median, IQR, and whiskers at a glance. Histograms (option A) are excellent for a single distribution but become cluttered when comparing three groups. Scatterplots (option C) show relationships between two continuous variables, not group comparisons. Bar charts (option D) show only means and do not reveal distribution shape, skew, or outliers.
Question 3 True / False
For a strongly right-skewed distribution, the mean is greater than the median.
TTrue
FFalse
Answer: True
In a right-skewed distribution, the tail extends to the right — there are a small number of very high values. The mean is the balance point and is pulled toward those high values, while the median (the middle value) is not affected by their magnitude, only their count. The result is mean > median in right-skewed data, and mean < median in left-skewed data. For a symmetric distribution (like the normal), mean and median coincide. This relationship between skew and the mean-median gap is a key diagnostic tool.
Question 4 True / False
Descriptive statistics are primarily needed during the exploratory phase of analysis; once inferential tests are run, summary statistics become irrelevant.
TTrue
FFalse
Answer: False
Descriptive statistics matter at every stage of research, including when reporting results. Inferential statistics (p-values, confidence intervals) tell you about the probability of your findings under various assumptions, but they do not communicate what the data actually looks like. A reader needs descriptive statistics — means or medians, standard deviations or IQRs, sample sizes — to evaluate the practical significance and context of any inferential result. Reporting inferential tests without descriptive summaries is incomplete and potentially misleading.
Question 5 Short Answer
Why does the choice between mean and standard deviation versus median and interquartile range depend on the shape of the data's distribution?
Think about your answer, then reveal below.
Model answer: Mean and standard deviation are sensitive to outliers and extreme values — they are appropriate when the distribution is roughly symmetric, because the mean then accurately represents the center. In skewed distributions or when outliers are present, the mean is pulled away from the bulk of the data, making it a poor 'typical value,' and the standard deviation may be inflated. Median and IQR are resistant to these extremes; they describe the center and spread of the middle 50% of the data. Matching the statistic to the distribution shape ensures the summary is actually informative.
This pairing principle — mean/SD for symmetric data, median/IQR for skewed or outlier-prone data — follows from understanding what each statistic measures. The mean is the arithmetic average, equally influenced by every data point; the standard deviation measures average deviation from that mean. Both assume the mean is a meaningful center. When it isn't (due to skew), these statistics describe a value that few actual data points are near, misleading anyone who uses them to form expectations about typical cases.