The asymptotic equipartition property (AEP) states that for a long sequence of n i.i.d. random variables, -(1/n) log p(X_1, ..., X_n) converges to H(X) in probability. This means almost all observed sequences have probability approximately 2^(-nH), forming the "typical set" of about 2^(nH) sequences. Although the total number of possible sequences is |alphabet|^n, the typical set is exponentially smaller (when H < log|alphabet|) yet carries nearly all the probability. The AEP is the law of large numbers applied to information and is the foundational tool for proving both source coding and channel coding theorems.
The AEP is the information-theoretic manifestation of the law of large numbers. For i.i.d. random variables X_1, ..., X_n, the log-probability of the sequence is a sum: log p(X_1, ..., X_n) = sum log p(X_i). By the law of large numbers, the average (1/n) sum log p(X_i) converges to E[log p(X)] = -H(X). So -(1/n) log p(X^n) converges to H(X). This means the probability of almost every observed sequence is approximately 2^(-nH).
The typical set A_epsilon^(n) consists of all sequences x^n satisfying |-(1/n) log p(x^n) - H(X)| <= epsilon. The AEP guarantees three properties: (1) Pr(X^n in A_epsilon^(n)) > 1 - delta for large enough n. (2) |A_epsilon^(n)| <= 2^(n(H+epsilon)) — the typical set is small. (3) Each typical sequence has probability between 2^(-n(H+epsilon)) and 2^(-n(H-epsilon)) — they are all approximately equiprobable at the exponential scale. Property 3 explains the name "equipartition": probability is roughly equally distributed among typical sequences.
This structure is the engine behind Shannon's theorems. For source coding: since there are about 2^(nH) typical sequences carrying nearly all the probability, assigning nH-bit indices to them achieves near-entropy compression. For channel coding: at the receiver, the output sequence is typical given the true codeword. If the code rate R < C, the number of codewords (2^(nR)) is small enough that no other codeword's typical output set overlaps significantly. The decoder can identify the correct codeword with high probability by checking which codeword's conditional typical set contains the received sequence.
The AEP extends beyond i.i.d. sources. For stationary ergodic sources, the Shannon-McMillan-Breiman theorem provides an analogous result: -(1/n) log p(X_1, ..., X_n) converges to the entropy rate almost surely. This generalization underpins information theory's applicability to structured sources like natural language and time series, where successive symbols are far from independent.