The binary symmetric channel (BSC) is the simplest model of a noisy digital communication channel. It transmits binary symbols (0 or 1), and each bit is independently flipped with crossover probability p. Its capacity is C = 1 - H(p) bits per use, where H(p) = -p log p - (1-p) log(1-p) is the binary entropy function. The BSC is symmetric in that both symbols suffer the same error probability, making the capacity-achieving input distribution uniform. It serves as the fundamental testbed for channel coding theory and illustrates how noise reduces capacity from the ideal of 1 bit per use.
The binary symmetric channel is the "hydrogen atom" of information theory — the simplest non-trivial channel model. It takes a binary input (0 or 1) and independently flips each bit with probability p. With probability 1-p the bit passes through correctly. "Symmetric" means both 0 and 1 suffer the same error probability; there is no bias toward one symbol.
The capacity calculation is elegant. Since the channel is symmetric, the capacity-achieving input distribution is uniform: p(X=0) = p(X=1) = 1/2. This maximizes H(Y) = 1 bit. The noise entropy is H(Y|X) = H(p) — given the input, the output is a Bernoulli(p) flip, whose entropy is the binary entropy H(p). So C = H(Y) - H(Y|X) = 1 - H(p). When p = 0 (no noise), C = 1 bit. When p = 1/2 (maximum noise), C = 0 bits. The capacity decreases smoothly as noise increases from 0 to 0.5, then increases back to 1 as p approaches 1 (because a perfect inverter is a perfect channel in disguise).
The BSC makes the channel coding theorem concrete. At p = 0.1, the capacity is about 0.531 bits per use. This means you can reliably transmit 531 bits of information per 1000 channel uses — but only with error-correcting codes that spread each information bit across many channel uses. Without coding, the 10% error rate is irrecoverable. With coding at rate R < C, the decoder can use the redundancy to identify and correct errors, achieving arbitrarily low error probability. The coding theorem guarantees this is possible; practical codes like LDPC and turbo codes come remarkably close.
The BSC also illustrates a general principle: channel symmetry simplifies capacity computation. For symmetric channels, the uniform input distribution is always optimal, and the capacity is the log of the output alphabet size minus the entropy of each row of the transition matrix. This shortcut avoids the need for numerical optimization (like the Blahut-Arimoto algorithm) that general channels require.