For a Bernoulli(0.5) source with Hamming distortion, R(D) = 1 - H(D) for D in [0, 0.5]. At D = 0.5, R = 0. What does this mean?
AYou can perfectly represent the source with 0 bits, which is clearly impossible
BAt distortion 0.5, a decoder that outputs random fair coin flips (ignoring the source) achieves average Hamming distortion 0.5 — no bits need to be transmitted because the decoder needs no information from the encoder
CThe formula breaks down at D = 0.5 and is not valid there
DLossy compression always requires at least 1 bit per symbol
For a fair coin source, if you tolerate 50% bit errors (D = 0.5), the decoder can achieve this by outputting random bits independently of the source. No communication is needed — 0 bits suffice. This makes intuitive sense: asking for 50% accuracy on fair coin flips is asking for nothing beyond random guessing. As the distortion tolerance decreases below 0.5, the decoder needs actual information about the source, and R(D) increases from 0 toward H = 1 bit at D = 0.
Question 2 True / False
Rate-distortion theory applies only to Gaussian sources and mean-squared error distortion.
TTrue
FFalse
Answer: False
Rate-distortion theory is completely general — it applies to any source distribution and any distortion measure that satisfies basic measurability conditions. The Gaussian source with MSE is a special case that happens to have a clean closed-form solution. Other important cases include: discrete sources with Hamming distortion, sources with perceptual distortion measures, and vector sources with various norms. The general formula R(D) = min I(X; X-hat) subject to E[d(X, X-hat)] <= D applies universally.
Question 3 Short Answer
Explain the operational meaning of the rate-distortion function: what does R(D) tell an engineer designing a lossy codec?
Think about your answer, then reveal below.
Model answer: R(D) tells the engineer the absolute minimum bit rate needed to represent the source at distortion level D. If they are compressing at rate R bits/symbol, the best achievable distortion is D(R) = R^(-1)(R) — the inverse of the rate-distortion function. Any codec operating above the R(D) curve is suboptimal and could potentially be improved. A codec operating at the R(D) curve is information-theoretically optimal — no codec, however sophisticated, can achieve lower distortion at the same rate. The gap between a practical codec's performance and R(D) measures its inefficiency and motivates the search for better compression algorithms.
Shannon proved both achievability (R(D) is achievable with long block codes and joint typicality encoding) and the converse (rates below R(D) cannot achieve distortion D). As with channel coding, the achievability is an existence proof — practical lossy codecs use transform coding, vector quantization, or neural compression to approach R(D).