Huffman coding is an optimal prefix-free variable-length coding scheme for lossless compression of a known discrete source. It assigns shorter codewords to more probable symbols and longer codewords to less probable ones, achieving the minimum expected codeword length among all prefix codes. The algorithm builds a binary tree bottom-up by repeatedly merging the two least probable symbols. The resulting code satisfies H(X) <= L < H(X) + 1, where L is the average code length and H(X) is the source entropy. Huffman coding is optimal per-symbol; arithmetic coding can outperform it by encoding sequences as a whole.
The source coding theorem says you can compress a source to its entropy rate, but it doesn't tell you how. Huffman coding, invented by David Huffman in 1952 as a student at MIT, provides a concrete, optimal algorithm for constructing variable-length binary codes for known discrete sources.
The algorithm is simple and elegant. Start with each symbol as a leaf node weighted by its probability. Repeatedly find the two nodes with the smallest weights, merge them into a new internal node whose weight is the sum, and assign the left branch a 0 and the right branch a 1 (or vice versa). Continue until only one root node remains. Each symbol's codeword is the sequence of bits on the path from the root to its leaf. More probable symbols end up near the root (short codewords); less probable symbols end up deeper in the tree (long codewords).
The resulting code is prefix-free — no codeword is a prefix of any other — which means the decoder can identify each symbol immediately without needing lookahead or delimiters. It is optimal among all prefix-free codes for that source: no other assignment of variable-length codewords achieves a lower expected length. The average code length L satisfies H(X) <= L < H(X) + 1. The gap comes from the restriction to integer-length codewords. When all symbol probabilities are powers of 2, the gap is zero and Huffman achieves entropy exactly.
Despite its elegance, Huffman coding has limitations. It works on one symbol at a time, so it cannot exploit correlations between symbols. It requires knowing the source statistics in advance (or using a two-pass approach). And the integer-length constraint means it can waste nearly 1 bit per symbol for highly skewed distributions. Arithmetic coding addresses all three limitations: it encodes entire sequences as a single number in [0,1), achieving rates arbitrarily close to entropy for any distribution. In practice, Huffman coding remains widely used (JPEG, DEFLATE/gzip) because it is fast, simple to implement, and the per-symbol overhead is acceptable for many applications.