Xₙ converges to X in L^p if lim_{n→∞} E[|Xₙ - X|^p] = 0, equivalently ||Xₙ - X||_p → 0 in the L^p norm. L^p spaces form a Banach space of random variables with finite p-th moment. Convergence in L² (mean square convergence) is particularly important because it preserves inner products.
From your study of measure-theoretic expectation, you know that E[|X|^p] computes the p-th moment of the absolute value of a random variable — an integral with respect to the probability measure P. L^p convergence uses this integral to define a notion of distance between random variables: Xₙ converges to X in L^p if the p-th moment of their difference vanishes, that is, E[|Xₙ − X|^p] → 0 as n → ∞. Equivalently, ||Xₙ − X||_p → 0, where ||Y||_p = (E[|Y|^p])^{1/p} is the L^p norm. This is a genuine norm on the space of random variables with finite p-th moment (identifying variables that agree almost surely), making L^p a Banach space — a complete normed vector space.
The case p = 2 is special because L² is not just a Banach space but a Hilbert space, equipped with the inner product ⟨X, Y⟩ = E[XY]. This inner product gives L² a geometric structure — orthogonality, projections, the Cauchy-Schwarz inequality — that other L^p spaces lack. Convergence in L² (mean-square convergence) preserves inner products: if Xₙ → X and Yₙ → Y in L², then E[XₙYₙ] → E[XY]. This is why L² is the natural setting for defining conditional expectation as an orthogonal projection, for least-squares estimation, and for the spectral analysis of stationary processes. The geometry of L² turns probabilistic questions into problems of projecting onto subspaces.
L^p convergence is stronger than convergence in probability but weaker than almost sure convergence — though the exact relationships are subtle. The standard counterexample is the typewriter sequence on [0, 1]: indicator functions on subintervals that cycle through the interval with shrinking width. This sequence converges to 0 in every L^p (since E[|Xₙ|^p] = length of the subinterval → 0) but does not converge almost surely (at any point ω, the sequence returns to 1 infinitely often). In the other direction, convergence in probability does not imply L^p convergence without an additional condition: the sequence must be uniformly integrable in L^p. Without this, the tails of the distribution can carry enough mass to prevent L^p convergence even when the random variables are converging in probability.
The hierarchy of L^p spaces is governed by Lyapunov's inequality: on a probability space (where total measure is 1), ||X||_q ≤ ||X||_p whenever 1 ≤ q ≤ p. This means convergence in a higher L^p automatically implies convergence in every lower L^q. If Xₙ → X in L², then Xₙ → X in L¹ as well. The converse fails: L¹ convergence does not imply L² convergence. Understanding these relationships — which modes of convergence imply which, and what additional conditions bridge the gaps — is essential for the rigorous study of limit theorems, estimator properties, and stochastic processes.