A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Correlation and Causation Distinction

College Depth 88 in the knowledge graph ☐ I know this ☆ Set as goal

438prerequisites beneath it

Core Idea

Two variables can be correlated (move together) without one causing the other. Confounding variables, reverse causation, or coincidence can explain correlations. Valid causal reasoning requires ruling out alternative explanations. Example: ice cream sales and drowning deaths correlate because both increase in summer, not because ice cream causes drowning.

Common Misconceptions

Any strong correlation suggests causation (spurious correlations are common). Temporal order proves causation (even if X precedes Y, Z might cause both). Controlling for one variable proves there is no confounding (multiple confounders might still be at work). Correlation of zero means no relationship (nonlinear relationships produce zero linear correlation).

Explainer

From statistical reasoning, you know that correlation measures the degree to which two variables move together — when one goes up, does the other tend to go up (positive correlation) or down (negative correlation)? Correlation is a purely statistical relationship between observed values. Causation is a different kind of claim: it says that changes in one variable *produce* changes in another, not merely that they co-vary. Understanding why these come apart is one of the most practically important skills in reasoning from data.

The classic illustration: ice cream sales and drowning deaths correlate strongly across months of the year. Both rise in summer, both fall in winter. Does eating ice cream cause drowning? Obviously not. The real explanation is a confounding variable — summer. Hot weather causes both more ice cream consumption and more swimming (which leads to more drowning). The correlation is genuine; the causal inference is wrong. A confounder is any third variable that independently influences both of the variables you're studying, creating a spurious association between them. Confounders are pervasive: wealth correlates with health, but socioeconomic status influences both; shoe size correlates with reading ability in children, but both are caused by age.

Reverse causation is another explanation for correlation that has nothing to do with the causal direction you assumed. Hospitals are full of sick people — is going to hospitals causing sickness? No: the sickness came first and caused the hospital visit. People with more police in their neighborhoods often have higher crime rates — does policing cause crime? Often the reverse: more crime attracts more police. Without an experiment or careful causal reasoning, observational data can't tell you which direction causation runs.

To establish causation rigorously, you need to rule out confounders and reverse causation. The gold standard is a randomized controlled experiment: you randomly assign subjects to treatment and control groups, eliminating systematic differences that could confound results. When randomization is impossible — in economics, epidemiology, history — researchers use natural experiments, instrumental variables, difference-in-differences, and other quasi-experimental designs that try to approximate random assignment. The point of all these methods is the same: isolate the effect of X on Y by holding everything else constant.

A useful mental habit: whenever you see a correlation reported in the media or a policy claim, ask three questions. (1) Could a confounding variable explain the pattern? (2) Could causation run in the opposite direction? (3) Could this be pure coincidence in a large dataset (spurious correlation)? The bar for claiming causation is much higher than the bar for observing correlation, and most real-world reasoning — in health, education, economics, and policy — fails to clear it.

What did you take from this?

Topics in reflective domains aren't scored by quiz answers. Read, reflect, and mark when you've thought it through.

Quiz me anyway →

Prerequisite Chain

Understanding Zero → The Number Zero → Counting to Five → Counting to 10 → Counting to 20 → Counting a Set of Objects Up to 20 → Cardinality: The Last Number Counted → Matching Numerals to Quantities → Subitizing Small Quantities → Addition Within 10 → Number Bonds to 10 → Addition Within 20 → Doubles and Near Doubles → Doubles Facts Within 10 → Near Doubles Facts Within 20 → Mental Math Strategies for Addition → Mental Math: Adding and Subtracting Tens → Addition Within 100 → Repeated Addition as Multiplication → Multiplication as Equal Groups → Multiplication: Arrays → Basic Multiplication Facts (0s, 1s, 2s, 5s, 10s) → Multiplication Facts Within 100 → Division as Equal Sharing → Division as Grouping (Measurement Division) → Division: Grouping (Repeated Subtraction) Model → Division: Fair Sharing Model → Division as Equal Sharing → Division as Grouping → Basic Division Facts → Division Facts Within 100 → Multiplication and Division Fact Families → Relationship Between Multiplication and Division → Division Facts as Inverse of Multiplication → Remainders and Quotients in Division → Division Word Problems → Multi-Step Word Problems → Solving Multi-Step Word Problems → Multiplication Word Problems → Division Word Problems → Introduction to Long Division → Factors and Multiples → Prime and Composite Numbers → Equivalent Fractions → Relating Fractions and Decimals → Decimal Place Value → Integers and the Number Line → Comparing and Ordering Integers → Absolute Value → Adding Integers → Subtracting Integers → Multiplying Integers → Introduction to Exponents → Order of Operations → Integer Order of Operations → Variable Expressions → The Distributive Property → Variables and Expressions Review → Introduction to Polynomials → Adding and Subtracting Polynomials → Multiplying Polynomials → Factorial → Permutations → Combinations → Counting Principles: Addition and Multiplication Rules → Introduction to Graph Theory → Propositional Logic Foundations → Logical Equivalences → Boolean Algebra → Introduction to Propositional Logic → Propositional Connectives → Propositional Semantics and Valuations → Truth Functions and Interpretation → Formula Evaluation and Truth Tables → Logical Equivalence of Formulas → Logical Equivalence in Propositional Logic → Conjunctive and Disjunctive Normal Forms → Sequent Calculus → Soundness and Completeness of Propositional Logic → Validity and Soundness → Logical Form and Argument Patterns → Modus Ponens and Modus Tollens → Probabilistic Reasoning → Inductive Reasoning → Analogical Reasoning and Argument by Analogy → Analogical Arguments: Strength and Weakness → Strength of Inductive Arguments → Statistical Reasoning Basics → Correlation and Causation Distinction

Longest path: 89 steps · 438 total prerequisite topics

Prerequisites (1)

Statistical Reasoning Basicshard

Leads To (0)

No topics depend on this one yet.