A binary channel flips each bit independently with probability 0. What is the channel capacity, and what does this mean operationally?
AC = 0 bits per use, because a noiseless channel carries no information
BC = 1 bit per use, because each input bit arrives perfectly at the output, so every channel use conveys one full bit of information
CC = 2 bits per use, because you can encode two bits per transmission in a noiseless channel
DC = infinity, because there is no noise to limit transmission
With no noise, Y = X always, so H(Y|X) = 0 and I(X;Y) = H(Y). Maximizing H(Y) over binary inputs gives H(Y) = 1 bit (achieved by uniform input). So C = 1 bit per channel use. A noiseless binary channel transmits exactly 1 bit per use — each symbol perfectly distinguishes between the two possibilities. Capacity is limited by the alphabet size, not just the noise level.
Question 2 Multiple Choice
Why does finding channel capacity require maximizing mutual information over the input distribution p(x), rather than simply computing I(X;Y) for any particular input?
ADifferent input distributions change the channel's noise characteristics
BThe channel transition probabilities p(y|x) are fixed by the physical channel, but the input distribution p(x) determines how much of the channel's capacity is actually utilized — a poor input distribution wastes capacity
CMaximization is required for mathematical convenience but has no operational significance
DThe input distribution must match the output distribution for reliable communication
The channel p(y|x) is given — it describes the physics of the medium. But the communicator chooses what to send. Different input distributions lead to different amounts of mutual information. For example, on a binary symmetric channel, using only the symbol '0' gives I(X;Y) = 0 (no information). Using uniform input maximizes I(X;Y). Capacity is the best you can do given the channel — it is a property of the channel itself, obtained by optimizing over the only degree of freedom available: what you choose to send.
Question 3 True / False
If a channel has capacity C = 0 bits per use, reliable communication is impossible at any positive rate.
TTrue
FFalse
Answer: True
C = 0 means the output Y provides zero information about the input X for every possible input distribution. The channel is completely useless — the output is statistically independent of the input. No coding scheme, no matter how sophisticated, can transmit any information reliably. This happens, for example, when the channel replaces every input with a fixed output regardless of what was sent, or when the noise completely overwhelms the signal.
Question 4 Short Answer
Explain why channel capacity is a single number that characterizes the channel, even though mutual information depends on the input distribution.
Think about your answer, then reveal below.
Model answer: Capacity C = max_{p(x)} I(X;Y) takes the supremum over all possible input distributions, leaving only the channel's transition probabilities p(y|x) as the determining factor. Once you optimize over the input, the result depends only on the channel itself. This is why capacity is a property of the channel, not of any particular communication scheme. It answers: given this channel's noise structure, what is the absolute best any communicator could achieve? The optimal input distribution that achieves C depends on the channel but need not be known by the receiver.
For many important channels, the capacity-achieving distribution has a known form. For the binary symmetric channel, it is uniform. For the Gaussian channel, it is Gaussian. For general channels, the Blahut-Arimoto algorithm computes the capacity-achieving distribution iteratively.