Joint and Conditional Entropy

Graduate Depth 75 in the knowledge graph I know this Set as goal
Unlocks 30 downstream topics
joint entropy conditional entropy chain rule uncertainty

Core Idea

Joint entropy H(X,Y) measures the total uncertainty in a pair of random variables considered together. Conditional entropy H(Y|X) measures the remaining uncertainty in Y after observing X — how much new information Y provides beyond what X already told you. The chain rule H(X,Y) = H(X) + H(Y|X) decomposes joint uncertainty into what X reveals plus what remains. Conditioning never increases entropy on average: H(Y|X) <= H(Y), with equality only when X and Y are independent. These quantities form the algebraic backbone of information theory.

Explainer

Shannon entropy measures the uncertainty in a single random variable. When you have two variables X and Y, you often want to know: how much total uncertainty is there, and how does knowing one reduce your uncertainty about the other? Joint and conditional entropy answer these questions precisely.

Joint entropy H(X,Y) = -sum over all (x,y) of p(x,y) log p(x,y) is simply Shannon entropy applied to the pair (X,Y) treated as a single random variable over the product space. It measures the total bits needed to describe both variables together. If X and Y are independent, H(X,Y) = H(X) + H(Y) — the total uncertainty is the sum of the individual uncertainties. If they are dependent, H(X,Y) < H(X) + H(Y) because some information is shared.

Conditional entropy H(Y|X) = sum over x of p(x) * H(Y|X=x) is the average remaining uncertainty in Y after learning X. For each specific value x, H(Y|X=x) measures the entropy of Y's conditional distribution given X=x; the conditional entropy averages this over all values of X. If X completely determines Y (like knowing a student's exam answers determines their score), then H(Y|X) = 0. If X tells you nothing about Y (independence), then H(Y|X) = H(Y).

The chain rule connects these: H(X,Y) = H(X) + H(Y|X). The total uncertainty in (X,Y) equals the uncertainty in X plus whatever uncertainty remains in Y after X is known. This can be chained: H(X,Y,Z) = H(X) + H(Y|X) + H(Z|X,Y). A fundamental inequality — often called "information never hurts" — states that H(Y|X) <= H(Y): on average, knowing more cannot increase your uncertainty. The gap H(Y) - H(Y|X) is the mutual information I(X;Y), which measures how much X tells you about Y. These three quantities — joint entropy, conditional entropy, and the chain rule — form the algebraic foundation on which the rest of information theory is built.

Practice Questions 4 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesAngle Pairs: Complementary, Supplementary, and VerticalParallel Lines and TransversalsCorresponding AnglesAlternate Interior AnglesTriangle Angle Sum TheoremExterior Angle TheoremTriangle Inequality TheoremSimilar Triangles: AA SimilaritySimilar Triangles: SSS and SAS SimilarityProportions in Similar TrianglesRight Triangle Trigonometry IntroductionTrigonometric Ratios ReviewRadian MeasureConverting Between Degrees and RadiansThe Unit CircleGraphing Sine and CosineGraphing Tangent and Reciprocal Trigonometric FunctionsDerivatives of Trigonometric FunctionsAntiderivativesIndefinite IntegralsBasic Integration RulesRiemann SumsDefinite Integral DefinitionProbability Density Functions and Continuous DistributionsCumulative Distribution FunctionsContinuous Random VariablesProbability Density FunctionsShannon EntropyJoint and Conditional Entropy

Longest path: 76 steps · 325 total prerequisite topics

Prerequisites (3)

Leads To (4)