Source Coding Theorem

Graduate Depth 75 in the knowledge graph I know this Set as goal
Unlocks 20 downstream topics
source coding lossless compression noiseless coding Shannon first theorem

Core Idea

Shannon's source coding theorem (noiseless coding theorem) states that the entropy H(X) of a source is the fundamental limit of lossless compression. No lossless code can achieve an average rate below H(X) bits per symbol, and there exist codes that get arbitrarily close to H(X). For a sequence of n i.i.d. symbols, the compressed length approaches nH(X) bits as n grows. This theorem established entropy as the operationally meaningful measure of information content and launched the field of information theory.

Explainer

Shannon entropy tells you the average uncertainty per symbol. The source coding theorem gives this number operational teeth: entropy is the compression limit. If a source has entropy H bits per symbol, you need at least H bits per symbol on average to represent the output losslessly, and you can get arbitrarily close to H with a sufficiently clever code.

The achievability proof relies on the concept of typical sequences. For a long sequence of n i.i.d. symbols from a source with entropy H, the law of large numbers implies that the empirical frequency of each symbol concentrates around its true probability. The "typical set" — sequences whose empirical statistics match the source distribution — contains approximately 2^(nH) sequences, even though the total number of possible sequences is |alphabet|^n = 2^(n log |alphabet|). Since 2^(nH) << 2^(n log |alphabet|) when H < log|alphabet|, we only need nH bits to index the typical sequences. Atypical sequences have negligible total probability and can be handled with a small overhead.

The converse — that you cannot beat H — follows from the non-negativity of KL divergence. Any lossless code induces a probability distribution over codewords, and the expected codeword length is at least the entropy of the source. Attempting to assign shorter codewords to all symbols violates the Kraft inequality (the constraint that ensures unique decodability). Shorter codes for some symbols necessarily mean longer codes for others, and the probability-weighted average cannot go below H.

The practical impact is enormous. Before Shannon, engineers designed compression schemes by intuition and heuristics. After Shannon, there was a target: entropy. Huffman coding (1952) achieves within 1 bit of entropy per symbol. Arithmetic coding achieves within a fraction of a bit. Modern compressors (gzip, zstd, lossless PNG) all operate in the shadow of this theorem, exploiting statistical redundancy to approach the entropy rate of their input. The theorem also explains why some data cannot be compressed: truly random data has maximum entropy, and no algorithm can shrink it.

Practice Questions 3 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesAngle Pairs: Complementary, Supplementary, and VerticalParallel Lines and TransversalsCorresponding AnglesAlternate Interior AnglesTriangle Angle Sum TheoremExterior Angle TheoremTriangle Inequality TheoremSimilar Triangles: AA SimilaritySimilar Triangles: SSS and SAS SimilarityProportions in Similar TrianglesRight Triangle Trigonometry IntroductionTrigonometric Ratios ReviewRadian MeasureConverting Between Degrees and RadiansThe Unit CircleGraphing Sine and CosineGraphing Tangent and Reciprocal Trigonometric FunctionsDerivatives of Trigonometric FunctionsAntiderivativesIndefinite IntegralsBasic Integration RulesRiemann SumsDefinite Integral DefinitionProbability Density Functions and Continuous DistributionsCumulative Distribution FunctionsContinuous Random VariablesProbability Density FunctionsShannon EntropySource Coding Theorem

Longest path: 76 steps · 322 total prerequisite topics

Prerequisites (2)

Leads To (8)