A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Covariance and Correlation Coefficients

College Depth 60 in the knowledge graph ☐ I know this ☆ Set as goal

1,668topics build on this

270prerequisites beneath it

Expected Value: Theory and Properties Joint and Marginal Distributions→→Bivariate Normal Distribution Capital Asset Pricing Model (CAPM)+2 more

covariance correlation

Core Idea

Covariance Cov(X,Y)=E[(X−μ_X)(Y−μ_Y)] measures linear association; equals 0 if independent but nonzero doesn't imply dependence. Correlation ρ=Cov(X,Y)/(σ_X σ_Y) ∈ [−1,1] is scale-invariant. Zero correlation means no linear association.

Explainer

From expected value theory, you know E[X] = ∫ x f(x) dx and you've computed expectations of functions of a single random variable. From joint distributions, you know how to describe the behavior of two random variables together through their joint density or PMF, and how to recover marginal distributions. Covariance combines these ideas: it is the expected value of the product (X − μ_X)(Y − μ_Y), which measures whether X and Y tend to deviate from their means in the same direction at the same time.

The intuition is concrete. If X tends to be above its mean when Y is above its mean — and below when Y is below — then the product (X − μ_X)(Y − μ_Y) is typically positive, and Cov(X, Y) > 0. This is positive covariance: height and weight, for instance. If X tends to be high when Y is low (like temperature and heating bills), the product is typically negative, giving Cov(X, Y) < 0. If X and Y have no systematic linear relationship, the positive and negative products cancel out and Cov(X, Y) ≈ 0. The computational shortcut Cov(X, Y) = E[XY] − E[X]E[Y] follows directly from expanding the definition and using linearity of expectation.

The flaw with raw covariance as a measure of association is that it depends on scale. If you measure X in centimeters instead of meters, covariance multiplies by 100. This makes comparing covariances across different pairs of variables meaningless. Correlation ρ = Cov(X, Y) / (σ_X σ_Y) fixes this by normalizing: dividing by the product of standard deviations removes all units and scale. The result always lies in [−1, 1], a consequence of the Cauchy-Schwarz inequality applied to the inner product E[XY] on the space of square-integrable random variables. The extreme values ρ = ±1 occur precisely when Y = aX + b almost surely for some constants a and b — a perfect linear relationship.

The most important conceptual trap is the independence–correlation relationship. If X and Y are independent, then E[XY] = E[X]E[Y] (from the joint distribution factoring), so Cov(X, Y) = 0 and ρ = 0. But the converse fails: zero correlation does not imply independence. The classic example is X ~ Uniform(−1, 1) and Y = X². Then E[XY] = E[X³] = 0 (by symmetry of X³ around zero), and E[X]E[Y] = 0, so Cov(X, Y) = 0. Yet X and Y are completely dependent — knowing X determines Y exactly. The correlation captures only linear dependence; the full dependency structure requires the joint distribution.

Covariance and correlation are foundational to everything that builds on joint distributions. In linear regression, the slope of Y on X is β = Cov(X, Y)/Var(X), and R² equals ρ² — so the correlation coefficient literally measures the fraction of variance in Y explained by a linear function of X. In the bivariate normal distribution, which you'll see next, ρ is the single parameter characterizing the dependency between the two jointly normal components. More broadly, the covariance matrix Σ with entries Σᵢⱼ = Cov(Xᵢ, Xⱼ) is the fundamental object describing the geometry of multivariate distributions, and every technique in multivariate statistics — PCA, factor analysis, the multivariate normal — is built on manipulating it.

Practice Questions 5 questions

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Making 10 as an Addition Strategy → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts Through 10 → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Opposites and Additive Inverses → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Introduction to Exponents → Order of Operations → Integer Order of Operations → Variable Expressions → Function Notation Review → Random Variables: Definition and Classification → Probability Mass Functions and Discrete Distributions → Expected Value: Theory and Properties → Covariance and Correlation Coefficients

Longest path: 61 steps · 270 total prerequisite topics

Prerequisites (2)

Expected Value: Theory and Propertieshard Joint and Marginal Distributionshard

Leads To (4)

Bivariate Normal Distributionsoft Capital Asset Pricing Model (CAPM)hard Principal Component Analysissoft Simple Linear Regression: Theory and Estimationhard