← Graph View All Domains

A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Typical Sequences and the AEP

Research Depth 92 in the knowledge graph ☐ I know this ☆ Set as goal

11topics build on this

414prerequisites beneath it

See this on the map →

Expected Value Shannon Entropy +1 more→→Channel Coding Theorem Rate-Distortion Theory

Core Idea

The asymptotic equipartition property (AEP) states that for a long sequence of n i.i.d. random variables, -(1/n) log p(X_1, ..., X_n) converges to H(X) in probability. This means almost all observed sequences have probability approximately 2^-nH, forming the "typical set" of about 2^nH sequences. Although the total number of possible sequences is |alphabet|^n, the typical set is exponentially smaller (when H < log|alphabet|) yet carries nearly all the probability. The AEP is the law of large numbers applied to information and is the foundational tool for proving both source coding and channel coding theorems.

Explainer

The AEP is the information-theoretic manifestation of the law of large numbers. For i.i.d. random variables X_1, ..., X_n, the log-probability of the sequence is a sum: log p(X_1, ..., X_n) = sum log p(X_i). By the law of large numbers, the average (1/n) sum log p(X_i) converges to E[log p(X)] = -H(X). So -(1/n) log p(Xⁿ) converges to H(X). This means the probability of almost every observed sequence is approximately 2^-nH.

The typical set A_epsilonⁿ consists of all sequences xⁿ satisfying |-(1/n) log p(xⁿ) - H(X)| <= epsilon. The AEP guarantees three properties: (1) Pr(Xⁿ in A_epsilonⁿ) > 1 - delta for large enough n. (2) |A_epsilonⁿ| <= 2^(n(H+epsilon)) — the typical set is small. (3) Each typical sequence has probability between 2^(-n(H+epsilon)) and 2^(-n(H-epsilon)) — they are all approximately equiprobable at the exponential scale. Property 3 explains the name "equipartition": probability is roughly equally distributed among typical sequences.

This structure is the engine behind Shannon's theorems. For source coding: since there are about 2^nH typical sequences carrying nearly all the probability, assigning nH-bit indices to them achieves near-entropy compression. For channel coding: at the receiver, the output sequence is typical given the true codeword. If the code rate R < C, the number of codewords (2^nR) is small enough that no other codeword's typical output set overlaps significantly. The decoder can identify the correct codeword with high probability by checking which codeword's conditional typical set contains the received sequence.

The AEP extends beyond i.i.d. sources. For stationary ergodic sources, the Shannon-McMillan-Breiman theorem provides an analogous result: -(1/n) log p(X_1, ..., X_n) converges to the entropy rate almost surely. This generalization underpins information theory's applicability to structured sources like natural language and time series, where successive symbols are far from independent.

Practice Questions 3 questions