Shannon's channel coding theorem (noisy coding theorem) proves that for any discrete memoryless channel with capacity C, reliable communication is possible at any rate R < C and impossible at any rate R > C. The achievability proof shows that randomly generated codebooks, with maximum likelihood decoding, achieve vanishing error probability as the block length n grows. The converse uses Fano's inequality to show that rates above C force non-vanishing error. This theorem separated the problem of communication into source coding (compression) and channel coding (error correction), enabling the modular design of modern communication systems.
Channel capacity C = max I(X;Y) tells you the speed limit for a noisy channel. The channel coding theorem tells you that this speed limit is achievable. Any rate below C can be attained with arbitrarily low error probability, and no rate above C can be attained reliably. This is the most important theorem in information theory, and its proof reveals deep ideas about the structure of reliable communication.
The achievability proof uses random coding. Generate 2^(nR) codewords of length n by drawing each symbol independently from the capacity-achieving input distribution. To send message m, transmit the m-th codeword. The decoder uses maximum likelihood: it finds the codeword most likely to have produced the received output. The key insight is that for R < C, the probability that any incorrect codeword looks like the received output decreases exponentially in n. The total error probability (over all possible messages and noise realizations) vanishes as n grows. This works because when R < C, there are "few enough" codewords that the channel's noise cannot confuse them — the mutual information is sufficient to distinguish between the messages.
The converse proves that R > C is impossible. Using Fano's inequality — which bounds the probability of error in terms of the conditional entropy H(M|Y^n) — the proof shows that if the error probability is small, then the rate R must satisfy R <= C + epsilon for vanishing epsilon. In other words, trying to transmit faster than capacity forces the decoder to make errors at a rate bounded away from zero.
The theorem's practical impact comes from the separation principle: source coding and channel coding can be designed independently without loss of optimality. The source coder compresses the message to its entropy rate, producing a bit stream. The channel coder adds redundancy to this bit stream to protect against noise. As long as the source rate is below channel capacity, the combined system achieves reliable communication at the source's entropy rate. This modular architecture — compress then protect — is the foundation of every modern digital communication system, from cell phones to deep-space probes.