A source has alphabet {A, B, C} with probabilities {0.7, 0.2, 0.1} and entropy H ≈ 1.157 bits. For sequences of length n = 1000, approximately how many typical sequences are there, and how does this compare to the total number of sequences?
AAbout 2^1157 typical sequences out of 3^1000 ≈ 2^1585 total — the typical set is a vanishingly small fraction of all sequences but contains nearly all the probability
BAbout 3^1000 typical sequences — all sequences are typical for large n
CAbout 1000 typical sequences — one for each position
DAbout 2^1000 typical sequences — one bit per symbol
The typical set has approximately 2^(nH) = 2^(1000 * 1.157) ≈ 2^1157 sequences. The total number of sequences is 3^1000 ≈ 2^1585. The ratio is 2^1157 / 2^1585 = 2^(-428), an astronomically small fraction. Yet these ~2^1157 sequences account for probability approaching 1. The other 2^1585 - 2^1157 ≈ 2^1585 sequences are individually very improbable and collectively carry negligible total probability. This is the AEP: probability concentrates on a 'thin' typical set.
Question 2 True / False
The AEP guarantees that every high-probability sequence belongs to the typical set.
TTrue
FFalse
Answer: False
The typical set is defined by approximate probability: sequences x^n where 2^(-n(H+epsilon)) <= p(x^n) <= 2^(-n(H-epsilon)). The most probable individual sequence (e.g., all-A for a biased source) has probability much higher than 2^(-nH) and is NOT in the typical set. But its probability is just one sequence — while it may be the single most likely outcome, the typical set contains exponentially many sequences whose collective probability approaches 1. The distinction between the most probable sequence and the set of collectively probable sequences is crucial.
Question 3 Short Answer
Explain how the AEP enables the source coding theorem's achievability: why can a source with entropy H be compressed to approximately nH bits for long sequences?
Think about your answer, then reveal below.
Model answer: The AEP shows that with high probability, the source output falls in the typical set, which contains approximately 2^(nH) sequences. To encode a typical sequence, assign each one a unique binary index — this requires about nH bits (log2 of 2^(nH)). Atypical sequences (which have negligible total probability) can be encoded separately with a flag bit and their raw representation, adding negligible overhead. So the expected code length is approximately nH bits, or H bits per symbol. The AEP provides the bridge between entropy as an abstract quantity and entropy as an achievable compression rate.
This argument also reveals why the source coding theorem is an asymptotic result: the AEP relies on the law of large numbers, which requires large n for the concentration to be tight. For small n, the typical set is not well-separated from the atypical set, and compression cannot achieve H bits per symbol.