Rate-distortion theory characterizes the fundamental limits of lossy compression. The rate-distortion function R(D) = min_{p(x-hat|x): E[d(x,x-hat)]<=D} I(X; X-hat) gives the minimum number of bits per symbol needed to represent a source with average distortion at most D. At D=0, R(0) = H(X) (lossless compression). As tolerable distortion increases, fewer bits are needed. For a Gaussian source with mean-squared error distortion, R(D) = (1/2) log(sigma^2/D) for D <= sigma^2. Rate-distortion theory provides the information-theoretic foundation for JPEG, MP3, and all lossy codecs.
The source coding theorem handles lossless compression: H(X) bits per symbol, minimum. But what if you can tolerate some error? A photograph compressed to 1/20th its size looks nearly identical to the human eye. Rate-distortion theory asks: given a distortion budget D, what is the minimum bit rate?
The answer is R(D) = min I(X; X-hat), where the minimization is over all conditional distributions p(x-hat|x) (reconstruction rules) satisfying E[d(X, X-hat)] <= D. The distortion function d(x, x-hat) measures the cost of representing x as x-hat — common choices include mean squared error (for continuous sources), Hamming distance (for discrete sources), and perceptual metrics. The mutual information I(X; X-hat) quantifies how much information the encoder must send about X for the decoder to produce X-hat. The minimization finds the reconstruction strategy that requires the least information while meeting the distortion constraint.
For a Gaussian source X ~ N(0, sigma^2) with MSE distortion, the rate-distortion function is R(D) = (1/2) log2(sigma^2/D) for D <= sigma^2, and R(D) = 0 for D > sigma^2. This is elegant: each additional bit halves the distortion (each bit doubles the signal-to-distortion ratio by 6 dB). The optimal strategy is to quantize the source with a Gaussian codebook. For a Bernoulli(1/2) source with Hamming distortion, R(D) = 1 - H(D) for D in [0, 0.5], showing that tolerating even small bit error rates substantially reduces the required rate.
Rate-distortion theory provides the theoretical foundation for all lossy compression. JPEG, MP3, H.264, and neural codecs all operate in the space between their actual performance and the R(D) curve. The theory also connects to machine learning through the information bottleneck method (which trades off compression of input against prediction of output) and to communication through joint source-channel coding (where the source's R(D) and the channel's capacity interact). Understanding rate-distortion theory is essential for anyone designing systems where quality and bit rate must be traded off.