CCodewords: 00, 01, 10, 11; average length = 2 bits
DCodewords: 0, 1, 00, 01; average length = 1.25 bits
Huffman's algorithm merges the two smallest (0.125 + 0.125 = 0.25), then the two smallest remaining (0.25 + 0.25 = 0.5), then the final two (0.5 + 0.5 = 1.0). This produces codewords of length 1, 2, 3, 3. The average length is 1.75 bits, which exactly equals the entropy H = -0.5*log2(0.5) - 0.25*log2(0.25) - 2*0.125*log2(0.125) = 1.75 bits. When probabilities are all powers of 2, Huffman coding achieves entropy exactly.
Question 2 True / False
Huffman codes are guaranteed to achieve the exact entropy rate H(X) for any source distribution.
TTrue
FFalse
Answer: False
Huffman codes achieve average length L satisfying H(X) <= L < H(X) + 1 bits per symbol. They reach H(X) exactly only when all probabilities are powers of 2 (like 1/2, 1/4, 1/8). For other distributions, the constraint of integer-length codewords forces L above H(X). For example, a source with two equally likely symbols has H = 1 bit, and Huffman achieves L = 1. But a source with p(A) = 0.9, p(B) = 0.1 has H ≈ 0.469, while Huffman gives both symbols 1-bit codes (L = 1), wasting 0.531 bits/symbol. Arithmetic coding can approach H more closely.
Question 3 Short Answer
Why must a practical compression code be prefix-free (no codeword is a prefix of another), and how does Huffman coding guarantee this property?
Think about your answer, then reveal below.
Model answer: A prefix-free code allows instantaneous, unambiguous decoding: as bits arrive, the decoder can identify the end of each codeword without looking ahead or needing delimiters. If a codeword were a prefix of another, the decoder would face ambiguity when it encounters the shorter pattern — it wouldn't know whether the symbol ends here or continues. Huffman coding guarantees prefix-freeness by construction: symbols are assigned to LEAVES of a binary tree. Since no leaf is an ancestor of another leaf, no codeword is a prefix of another. The tree structure means each codeword corresponds to a unique path from root to leaf.
The Kraft inequality, sum 2^(-l_i) <= 1 where l_i are codeword lengths, is the mathematical condition for the existence of a prefix-free code with those lengths. Huffman's algorithm finds the assignment that minimizes expected length subject to this constraint.
Question 4 Short Answer
A colleague proposes using Huffman coding on individual bytes of a file for compression. Why might this perform poorly compared to more sophisticated methods?
Think about your answer, then reveal below.
Model answer: Per-byte Huffman coding only exploits the frequency distribution of individual bytes, ignoring correlations between consecutive bytes. Natural data has substantial sequential structure — in English text, 'q' is almost always followed by 'u'; in code, opening braces predict closing braces. Per-byte Huffman treats each byte independently, missing all this context. Better approaches include: block Huffman (encoding multi-byte sequences), adaptive Huffman (updating the code as statistics change), or using Huffman as a backend after a context model (as LZ77+Huffman in gzip does). Arithmetic coding on a context model can approach the true conditional entropy H(X_n | X_{n-1}, ..., X_1), which is much lower than the marginal byte entropy.
This is why practical compressors combine modeling and coding. The model captures statistical structure (LZ77 captures repeated patterns, PPM captures character-level context), and the coder (Huffman or arithmetic) converts the model's predictions into a compressed bitstream. Huffman coding alone is a coder without a model.