Questions: Huffman Coding: Optimal Prefix Codes via Greedy
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
Symbols A, B, C, D have frequencies 50%, 25%, 15%, and 10% respectively. After Huffman coding, how many bits is A's codeword?
A1 bit
B2 bits
C3 bits
D4 bits — the algorithm distributes bits equally across symbols
The Huffman algorithm first merges the two least-frequent symbols (C+D → 25%), then merges that node with B (→ 50%), then finally merges with A. A ends up as an immediate child of the root — depth 1, so 1 bit. This is the key insight: the most frequent symbol earns the shortest code because it is always the last to be merged. A fixed 2-bit scheme would give A the same length as D, wasting bits on the most common symbol.
Question 2 Multiple Choice
A student applies Huffman coding to a file where all 8 symbols appear with identical frequency. What happens?
AThe algorithm fails — Huffman requires distinct frequencies to determine merge order
BAll symbols receive codewords of equal length (3 bits each), identical to a fixed-length code
COne symbol is arbitrarily assigned a 1-bit code and the rest receive longer codes
DCompression is maximized because identical frequencies are the ideal case
When all frequencies are equal, every merge is a tie, and the resulting tree is perfectly balanced. Each of 8 symbols gets a 3-bit code — identical to a fixed-length encoding. Huffman coding offers no compression benefit in this case, which shows that the gain comes entirely from exploiting frequency *differences*. The algorithm does not fail; it just produces a balanced tree.
Question 3 True / False
A symbol with frequency greater than 50% will always receive a 1-bit Huffman codeword.
TTrue
FFalse
Answer: True
If one symbol has frequency > 50%, the sum of all other symbols' frequencies is < 50%, which means all other symbols will be merged together before they can compete with the dominant symbol. The dominant symbol survives as the last node to be merged and becomes an immediate child of the root — depth 1, so exactly 1 bit. This is a direct consequence of the greedy merge-by-minimum-frequency strategy.
Question 4 True / False
Huffman coding guarantees optimal compression for any input, even when the actual symbol frequencies in the data differ from those used to build the code tree.
TTrue
FFalse
Answer: False
Huffman coding is optimal for the specific frequency distribution it was built on. If the actual frequencies in the data differ — for example, if the tree was built on English text frequencies but used to compress binary data — the resulting code lengths no longer match the actual information content, and compression can actually be worse than a fixed-length code. This is why adaptive Huffman coding or periodic tree rebuilding is needed for dynamic data.
Question 5 Short Answer
Why must the Huffman tree structure be transmitted alongside the compressed data, and how does this overhead affect the practical use of Huffman coding?
Think about your answer, then reveal below.
Model answer: The Huffman tree is specific to the frequency distribution of the data being compressed. Without the tree, the decoder cannot reverse the variable-length codewords back into the original symbols. This tree overhead means that Huffman coding provides diminishing returns on small files — the tree description can consume more space than the compression saves. In practice, Huffman coding is combined with other techniques (e.g., used within DEFLATE/gzip) where the overhead is amortized over large data blocks, and where frequency tables are efficiently transmitted using canonical Huffman codes.
This is the 'code tree must be transmitted' misconception flagged in the common misconceptions. The key practical implication is that Huffman's theoretical optimality applies to the compressed payload only; the bookkeeping cost of the code itself is a real-world constraint that limits its use on short inputs.