Questions: Histograms and Frequency Visualizations
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A researcher records the political party affiliation (Democrat, Republican, Independent, Other) of 200 survey respondents and creates a graph with adjacent bars (no gaps) of different heights for each category. What is wrong with this visualization?
ANothing — adjacent bars correctly show that the categories sum to 100%
BThe bars should show relative frequency as proportions rather than raw counts
CAdjacent bars without gaps are appropriate only for quantitative continuous data; categorical data requires a bar chart with gaps between bars to signal that the categories are discrete and unordered
DThe graph needs a title and axis labels before any judgment about its correctness can be made
A histogram uses adjacent bars to signal continuous quantitative data: the bars touch because adjacent bins represent adjacent ranges on a number line. For categorical data like political party, the categories have no inherent numerical ordering, so adjacent bars falsely imply continuity. A bar chart — with gaps between bars — is the correct display because it signals that the categories are discrete and separate. Using histogram format for categorical data is a fundamental misuse of the visualization type.
Question 2 Multiple Choice
A data analyst creates two histograms of the same 500 exam scores: one with 4 bins, one with 100 bins. The 4-bin histogram shows a roughly symmetric hump; the 100-bin histogram looks jagged and irregular. What is the most useful interpretation?
AThe 100-bin histogram is more accurate because it preserves more information by not aggregating
BThe 4-bin histogram is correct and the 100-bin histogram contains construction errors
CThe 4-bin histogram likely hides real structure by over-aggregating; the 100-bin histogram likely overfits to sampling noise; the true distribution shape requires a bin width that reveals structure without reflecting random variation
DBoth histograms are equally valid — bin width choice is purely aesthetic
Bin width is a genuine analytical choice. Too few bins (4) over-smooth the data, hiding real structure inside large intervals. Too many bins (100) amplify sampling noise, making every random fluctuation visible as a spike. The goal is to reveal the distribution's underlying shape, requiring a bin width that balances resolution against noise. Rules of thumb like Sturges' rule or the Freedman-Diaconis rule give principled starting points. Neither extreme is automatically 'more accurate.'
Question 3 True / False
A histogram with two distinct peaks (a bimodal distribution) suggests the data may come from two different subpopulations, and this pattern would be completely hidden if only the mean were reported.
TTrue
FFalse
Answer: True
The mean collapses a distribution to a single number and cannot reveal multimodality. If test scores cluster around 60 and again around 90, the mean might be around 75 — a value that represents nobody in either group. A histogram immediately reveals the two-group structure. This is why visualizing the full distribution, not just summary statistics, is essential before drawing conclusions from data.
Question 4 True / False
The height of a histogram bar typically equals the number of observations in that bin, regardless of how the histogram is constructed.
TTrue
FFalse
Answer: False
Histogram bars can represent frequency (raw count), relative frequency (proportion), or frequency density (proportion divided by bin width). When all bins have equal width, height is proportional to count and the shapes look the same. But when bin widths vary, using raw count or relative frequency as bar height is misleading — a wider bin looks taller simply because it spans more of the number line. In this case, frequency density (so that bar area = proportion) is the correct representation. Always check the y-axis label before interpreting histogram bar heights.
Question 5 Short Answer
Why does bin width choice matter when constructing a histogram, and how can a poor choice mislead the reader about the data's distribution?
Think about your answer, then reveal below.
Model answer: Bin width determines the resolution at which you view the data. Too-wide bins over-aggregate, hiding structure: a bimodal distribution might collapse into a single hump, and meaningful skewness might disappear into a symmetric-looking shape. Too-narrow bins amplify sampling noise: every random fluctuation becomes a visible spike or gap, suggesting structure that isn't present in the underlying population. A poor choice can therefore make a bimodal distribution look unimodal, a skewed distribution look symmetric, or a smooth distribution look jagged — all misleading impressions that could lead to wrong conclusions about the data.
The tension between under-smoothing (too few bins) and over-smoothing (too many bins) is a fundamental theme in statistics, reappearing in kernel density estimation, regression smoothing, and model selection. The histogram is where this tension appears in its most concrete, visual form.