← Graph View All Domains

A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Joint and Conditional Entropy

Graduate Depth 95 in the knowledge graph ☐ I know this ☆ Set as goal

30topics build on this

439prerequisites beneath it

See this on the map →

Probability Density Functions Shannon Entropy +1 more→→Entropy Rate of Stochastic Processes Fano's Inequality +2 more

Core Idea

Joint entropy H(X,Y) measures the total uncertainty in a pair of random variables considered together. Conditional entropy H(Y|X) measures the remaining uncertainty in Y after observing X — how much new information Y provides beyond what X already told you. The chain rule H(X,Y) = H(X) + H(Y|X) decomposes joint uncertainty into what X reveals plus what remains. Conditioning never increases entropy on average: H(Y|X) <= H(Y), with equality only when X and Y are independent. These quantities form the algebraic backbone of information theory.

Explainer

Shannon entropy measures the uncertainty in a single random variable. When you have two variables X and Y, you often want to know: how much total uncertainty is there, and how does knowing one reduce your uncertainty about the other? Joint and conditional entropy answer these questions precisely.

Joint entropy H(X,Y) = -sum over all (x,y) of p(x,y) log p(x,y) is simply Shannon entropy applied to the pair (X,Y) treated as a single random variable over the product space. It measures the total bits needed to describe both variables together. If X and Y are independent, H(X,Y) = H(X) + H(Y) — the total uncertainty is the sum of the individual uncertainties. If they are dependent, H(X,Y) < H(X) + H(Y) because some information is shared.

Conditional entropy H(Y|X) = sum over x of p(x) * H(Y|X=x) is the average remaining uncertainty in Y after learning X. For each specific value x, H(Y|X=x) measures the entropy of Y's conditional distribution given X=x; the conditional entropy averages this over all values of X. If X completely determines Y (like knowing a student's exam answers determines their score), then H(Y|X) = 0. If X tells you nothing about Y (independence), then H(Y|X) = H(Y).

The chain rule connects these: H(X,Y) = H(X) + H(Y|X). The total uncertainty in (X,Y) equals the uncertainty in X plus whatever uncertainty remains in Y after X is known. This can be chained: H(X,Y,Z) = H(X) + H(Y|X) + H(Z|X,Y). A fundamental inequality — often called "information never hurts" — states that H(Y|X) <= H(Y): on average, knowing more cannot increase your uncertainty. The gap H(Y) - H(Y|X) is the mutual information I(X;Y), which measures how much X tells you about Y. These three quantities — joint entropy, conditional entropy, and the chain rule — form the algebraic foundation on which the rest of information theory is built.

Practice Questions 4 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Dividing Integers → Unit Rates → Proportions → Percent Concept → Converting Between Fractions, Decimals, and Percents → Operations with Rational Numbers → Two-Step Equations → Solving Multi-Step Equations → Equations with Variables on Both Sides → Angle Pairs: Complementary, Supplementary, and Vertical → Parallel Lines and Transversals → Corresponding Angles → Alternate Interior Angles → Triangle Angle Sum Theorem → Exterior Angle Theorem → Triangle Inequality Theorem → Similar Triangles: AA Similarity → Similar Triangles: SSS and SAS Similarity → Proportions in Similar Triangles → Right Triangle Trigonometry Introduction → Sine, Cosine, and Tangent Ratios → Trigonometric Ratios Review → Radian Measure → Converting Between Degrees and Radians → The Unit Circle → Graphing Sine and Cosine → Graphing Tangent and Reciprocal Trigonometric Functions → Derivatives of Trigonometric Functions → Antiderivatives → Indefinite Integrals → Basic Integration Rules → Riemann Sums → Definite Integral Definition → Probability Density Functions and Continuous Distributions → Cumulative Distribution Functions → Continuous Random Variables → Probability Density Functions → Expected Value → Weak Law of Large Numbers → Probability Axioms and Rules → Conditional Probability → Law of Total Probability → Bayes' Theorem → Joint and Conditional Entropy

Longest path: 96 steps · 439 total prerequisite topics

Prerequisites (3)

Shannon Entropyhard Probability Density Functionshard Bayes' Theoremsoft

Leads To (4)

Entropy Rate of Stochastic Processeshard Fano's Inequalityhard Mutual Informationhard Slepian-Wolf Codinghard