← Graph View All Domains

A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Arithmetic Coding

Graduate Depth 93 in the knowledge graph ☐ I know this ☆ Set as goal

3topics build on this

415prerequisites beneath it

See this on the map →

Shannon Entropy Source Coding Theorem +1 more→→Data Compression Basics

Core Idea

Arithmetic coding represents an entire message as a single number in the interval [0, 1), achieving compression rates arbitrarily close to the source entropy. Unlike Huffman coding, which assigns a discrete codeword to each symbol, arithmetic coding encodes sequences by successively narrowing a subinterval: each symbol shrinks the current interval proportionally to its probability. The final interval width is the product of all symbol probabilities, and specifying a point within it requires approximately -log2 of that product = sum of -log2(p_i) bits — exactly the information content. Arithmetic coding is theoretically optimal and is the basis of modern entropy coders like ANS (asymmetric numeral systems).

Explainer

Huffman coding assigns integer-length codewords to individual symbols, which limits it to at most 1 bit above entropy per symbol. Arithmetic coding removes this limitation by encoding entire messages as single numbers, effectively achieving fractional bit lengths per symbol.

The idea is beautifully simple. The unit interval [0, 1) is partitioned among the alphabet symbols proportionally to their probabilities. The first symbol in the message selects the corresponding subinterval. That subinterval is then partitioned again in the same proportions, and the second symbol selects a sub-subinterval. This continues for every symbol. After processing the entire message, you have a tiny interval whose width equals the probability of the specific message (the product of all symbol probabilities for i.i.d. sources). To transmit the message, you send enough bits to uniquely identify a point inside that interval — approximately -log2(width) bits.

The magic is that -log2(product of probabilities) = sum of -log2(p_i), which is exactly the information content of the message. More probable messages produce wider intervals requiring fewer bits; improbable messages produce narrow intervals requiring more bits. Over many symbols, the average rate converges to the entropy H(X). The overhead is at most 2 bits for the entire message (to handle interval boundaries and termination), making the per-symbol overhead negligible for long sequences.

In practice, arithmetic coding operates on integers using finite-precision arithmetic, with a "renormalization" step that outputs bits as the interval narrows and shifts, keeping the working precision manageable. Modern variants like ANS (asymmetric numeral systems), used in Facebook's Zstandard and Apple's LZFSE, achieve the same theoretical optimality with higher throughput by encoding the state as a single integer rather than maintaining interval endpoints. Context-adaptive arithmetic coding (as in H.265/HEVC video compression) pairs the arithmetic coder with a sophisticated context model, achieving compression rates that track the conditional entropy given recent context — far surpassing what per-symbol Huffman can achieve.

Practice Questions 3 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Dividing Integers → Unit Rates → Proportions → Percent Concept → Converting Between Fractions, Decimals, and Percents → Operations with Rational Numbers → Two-Step Equations → Solving Multi-Step Equations → Equations with Variables on Both Sides → Angle Pairs: Complementary, Supplementary, and Vertical → Parallel Lines and Transversals → Corresponding Angles → Alternate Interior Angles → Triangle Angle Sum Theorem → Exterior Angle Theorem → Triangle Inequality Theorem → Similar Triangles: AA Similarity → Similar Triangles: SSS and SAS Similarity → Proportions in Similar Triangles → Right Triangle Trigonometry Introduction → Sine, Cosine, and Tangent Ratios → Trigonometric Ratios Review → Radian Measure → Converting Between Degrees and Radians → The Unit Circle → Graphing Sine and Cosine → Graphing Tangent and Reciprocal Trigonometric Functions → Derivatives of Trigonometric Functions → Antiderivatives → Indefinite Integrals → Basic Integration Rules → Riemann Sums → Definite Integral Definition → Probability Density Functions and Continuous Distributions → Cumulative Distribution Functions → Continuous Random Variables → Probability Density Functions → Expected Value → Shannon Entropy → Source Coding Theorem → Huffman Coding → Arithmetic Coding

Longest path: 94 steps · 415 total prerequisite topics

Prerequisites (3)

Source Coding Theoremhard Shannon Entropyhard Huffman Codingsoft

Leads To (1)

Data Compression Basicssoft