A histogram displays the distribution of numerical data by dividing the range into bins and showing frequency (or density) on the y-axis. Histograms reveal the shape of a distribution: whether it is symmetric, skewed, unimodal, or multimodal. Frequency distributions can be represented as raw counts, relative frequencies (proportions), or densities (for equal-width bins). Histograms are fundamental exploratory data analysis tools that precede and inform statistical inference.
Create histograms from data using different bin widths. Observe how bin choice affects apparent shape. Connect histogram shapes to distributions discussed (normal, uniform, bimodal, skewed).
Confusing area with height in density histograms. Thinking bin width choice doesn't matter. Misinterpreting gaps in histograms as zero frequency.
You already know how to compute the mean, median, and mode of a dataset — single-number summaries of a distribution. A histogram reveals something those summaries cannot: the shape of the distribution. Where are values concentrated? Are they spread evenly or clumped? Does the data have one peak or two? A histogram makes these patterns visible at a glance.
To build a histogram, you divide the range of data into equal-width intervals called bins and count how many observations fall in each bin. You then draw a bar for each bin, with height proportional to the count (or frequency). The result is a bar chart that shows where data is dense and where it's sparse. Notice that, unlike a bar chart for categorical data, the bars in a histogram are adjacent — no gaps — because the bins cover a continuous range. A gap in a histogram genuinely means no observations in that range.
Bin width is a critical design choice. Too few bins and the histogram is a featureless blob that hides structure; too many bins and it becomes jagged noise where every bin has 0 or 1 observation. The right bin width shows meaningful patterns without overfitting noise. As a rule of thumb, try the square root of the sample size as the number of bins, then adjust visually. Different bin widths can make the same data look symmetric or skewed, which is why analysts often try several before settling on one.
Reading histogram shapes connects to the distributions you'll encounter in statistical inference. A symmetric, bell-shaped histogram suggests normality — the mean, median, and mode are all close together. A right-skewed histogram has a long tail to the right, with the mean pulled higher than the median (a few large values drag the mean up). A left-skewed histogram mirrors this. A bimodal histogram has two peaks, suggesting two subpopulations in the data. Finally, density histograms rescale the y-axis so that bar areas (not heights) sum to 1, making them directly comparable to probability density functions — a key bridge between exploratory data analysis and the theoretical distributions you'll encounter in inference.