CLT: For any population with finite mean μ and variance σ², the sample mean X̄ is approximately N(μ,σ²/n) for large n. This holds regardless of population shape, explaining the ubiquity of normal distributions in statistics and enabling valid inferences without knowing the population distribution.
From your study of the distribution of the sample mean, you know that X̄ = (X₁ + … + Xₙ)/n has mean μ and variance σ²/n regardless of the population distribution. The Central Limit Theorem adds a far more striking result: the *shape* of the distribution of X̄ converges to a normal distribution as n grows, even if the population is skewed, discrete, bimodal, or nearly any other shape you can imagine. All that is required is that the population has finite mean and variance. This is why the normal distribution appears so ubiquitously — it is not that real data is normally distributed; it is that averages of data tend to be.
The intuition builds from what you know about adding random variables. Each new observation you average in is an independent perturbation. The sum X₁ + … + Xₙ is a superposition of n independent shocks. When you standardize — subtract the mean and divide by the standard deviation √(nσ²) — the resulting quantity Zₙ = (X̄ - μ)/(σ/√n) has mean 0 and variance 1. The CLT says Zₙ converges in distribution to N(0,1). The technical proof uses characteristic functions (Fourier transforms of the distribution): the characteristic function of Zₙ converges pointwise to e^{-t²/2}, which is the characteristic function of the standard normal. This pointwise convergence of characteristic functions implies convergence in distribution — the statement you actually use.
The CLT is most useful precisely when you do not know the population distribution. In practice: you measure n i.i.d. observations from some unknown distribution, compute X̄, and need to make an inference. The CLT tells you that X̄ is approximately normal with mean μ and standard deviation σ/√n. This standard error σ/√n shrinks as n grows, which formalizes the intuition that larger samples give more precise estimates. For n ≥ 30, the approximation is often excellent for moderately shaped distributions; for heavy-tailed or very skewed populations you need larger n.
Two extensions are worth knowing now. The multivariate CLT says that a vector of sample means converges jointly to a multivariate normal. The Lindeberg-Lévy CLT (the standard version) assumes identical distributions; the Lindeberg-Feller CLT relaxes this to independent but non-identical observations, requiring only that no single observation dominates the variance. Together, these theorems explain why normal approximations pervade hypothesis testing, confidence intervals, and regression — topics you will study next — and why the standard error is the universal currency of statistical uncertainty.