A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Joint and Marginal Distributions

College Depth 58 in the knowledge graph ☐ I know this ☆ Set as goal

2,709topics build on this

267prerequisites beneath it

Random Variables: Definition and Classification→→Conditional Distributions of Random Variables Covariance and Correlation Coefficients

joint-distribution marginal

Core Idea

Joint PMF/PDF p(x,y) or f(x,y) specifies the probability of pairs. Marginal distributions sum or integrate out the other variable: p_X(x)=∑_y p(x,y). Two variables are independent iff joint factors into marginals: p(x,y)=p_X(x)p_Y(y).

Explainer

When you studied random variables, each variable described the uncertainty about a single quantity — the outcome of one die roll, one measurement, one coin flip. But most real situations involve multiple uncertain quantities at once: the height and weight of a randomly chosen person, the price and volume of a stock, the test scores of two students. Joint distributions are the framework for handling multiple random variables simultaneously.

The joint PMF (for discrete variables) p(x, y) = P(X = x and Y = y) assigns a probability to every pair of values. It's a complete description of the relationship between X and Y — not just what each variable does on its own, but how they interact. Think of it as a table (for finite discrete variables): each cell (x, y) holds the probability of that particular combination. All cells must be non-negative, and they must sum to 1. From this table, you can answer any probability question about X and Y together.

Marginal distributions recover the individual behavior of each variable from the joint. To find P(X = x), just sum p(x, y) over all possible values of y — you're "summing out" Y, which is equivalent to asking what X is doing regardless of Y's value. Geometrically, if you imagine the joint distribution as a surface over a grid, the marginal of X is the "shadow" of that surface projected onto the x-axis. For continuous variables, summation becomes integration: f_X(x) = ∫ f(x, y) dy. The marginals tell you each variable's individual distribution, but they don't tell you the relationship *between* them.

Independence is the key structural condition. X and Y are independent if and only if the joint distribution factors: p(x, y) = p_X(x) · p_Y(y) for all pairs (x, y). In words: knowing X gives you no information about Y, and vice versa. Equivalently, the joint table looks like the "outer product" of the two marginals — every row is a scalar multiple of every other row. Independence is a very strong condition; most interesting pairs of variables are *not* independent, because they tend to be correlated (height and weight, income and education, etc.).

The payoff of understanding joint and marginal distributions is that they enable everything downstream: conditional distributions (what's the distribution of Y given that X = x?), covariance and correlation (how much do X and Y move together?), and the joint behavior of sums and transformations. When you encounter bivariate Normal distributions, regression models, or multivariate statistics, the joint distribution is always the starting point. The marginals describe what each variable does alone; the joint describes what they do together; the gap between those two descriptions is exactly the information carried by their statistical relationship.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Making 10 as an Addition Strategy → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts Through 10 → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Opposites and Additive Inverses → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Introduction to Exponents → Order of Operations → Integer Order of Operations → Variable Expressions → Function Notation Review → Random Variables: Definition and Classification → Joint and Marginal Distributions

Longest path: 59 steps · 267 total prerequisite topics

Prerequisites (1)

Random Variables: Definition and Classificationhard

Leads To (2)

Conditional Distributions of Random Variableshard Covariance and Correlation Coefficientshard