A continuous random variable X ~ Uniform(0, 1/4) has differential entropy h(X) = log2(1/4) = -2 bits. How should this negative value be interpreted?
AThe source generates negative information, which is physically impossible — the formula is wrong
BDifferential entropy is negative because the density exceeds 1 — the source is highly concentrated, and the negative value reflects that it takes fewer bits to describe X than a reference Uniform(0,1) variable
CNegative entropy means the random variable is deterministic
DThe logarithm should use natural log to avoid negative values
Differential entropy measures information content RELATIVE to a continuous uniform reference, not in absolute terms. When the density f(x) > 1 (as for Uniform(0, 1/4), where f(x) = 4), log f(x) > 0, making -f(x) log f(x) negative. The interpretation: X is 'more concentrated' than a unit-width uniform distribution, so its differential entropy is negative. Crucially, the mutual information I(X;Y) = h(X) - h(X|Y) is still non-negative and operationally meaningful. Differential entropy by itself is not the number of bits needed to represent X (that is infinite for continuous variables); only entropy differences have operational meaning.
Question 2 True / False
Among all continuous distributions with a fixed variance sigma^2, the Gaussian distribution maximizes differential entropy.
TTrue
FFalse
Answer: True
This is a fundamental result. Among all distributions on the real line with variance sigma^2, the Gaussian N(0, sigma^2) has the maximum differential entropy: h(X) = (1/2) log2(2*pi*e*sigma^2). This can be proved using the non-negativity of KL divergence: for any distribution f with variance sigma^2, D_KL(f || phi) >= 0 where phi is the Gaussian, which implies h(f) <= h(phi). This result is why the Gaussian channel has a particularly clean capacity formula — the worst-case noise (from an information-theoretic perspective) is Gaussian.
Question 3 Short Answer
Explain why differential entropy is not simply the limit of discrete entropy as quantization becomes infinitely fine, and what this implies about the relationship between discrete and continuous information theory.
Think about your answer, then reveal below.
Model answer: If you quantize a continuous variable X into bins of width delta, the discrete entropy of the quantized version is approximately h(X) + log(1/delta). As delta -> 0, log(1/delta) -> infinity, so the discrete entropy diverges. The finite quantity h(X) is what remains after subtracting this divergent term. This means differential entropy is an entropy DIFFERENCE, not an absolute entropy — it depends on the coordinate system (units of measurement). Changing variables from X to Y = aX shifts h by log|a|, unlike discrete entropy which is invariant under relabeling. The practical implication: only DIFFERENCES of differential entropies (like mutual information) are physically meaningful. Absolute differential entropy is a useful computational tool but lacks the direct operational interpretation of discrete entropy.
This subtlety trips up many students: they expect h(X) to represent 'the number of bits to describe X,' but describing a continuous variable to infinite precision requires infinite bits. What h(X) captures is the information content relative to a continuous uniform density — a quantity that is useful for computing mutual information and capacity but is not meaningful in isolation.