A study finds that countries with more TVs per household have lower infant mortality rates. A journalist concludes that distributing TVs to poor countries would reduce infant mortality. What logical error is being made?
AThe correlation is probably too weak to be meaningful for policy
BThe journalist has confused correlation for causation — both TV ownership and low infant mortality are likely caused by a common factor (higher economic development), making this a spurious correlation
CThe journalist should first run an experiment to confirm whether the correlation is real
DReverse causation is the issue — lower infant mortality causes countries to buy more TVs
Economic development is a confounding variable that independently drives both higher TV ownership and better healthcare (leading to lower infant mortality). The correlation is genuine; the causal inference is wrong. Distributing TVs wouldn't change the underlying cause. This is the classic confounding pattern: two variables caused by a common third variable create a spurious association between them. Option D is a plausible but less likely explanation here — reverse causation would mean TVs cause lower mortality, not that lower mortality causes TV ownership.
Question 2 Multiple Choice
A researcher notices coffee shops with more customers tend to have longer wait times and concludes that long waits attract customers (signaling quality). What alternative causal explanation should she first consider?
AThe relationship is spurious — coffee quality is a confounder causing both
BCausal direction may be reversed: popular shops generate long waits as a consequence of high demand, not the other way around
CThe correlation is too strong to be coincidental, so causation must run in the direction observed
DTemporal ordering proves cause: customers arrive before the wait time is measured
This is reverse causation. The more plausible direction is: good coffee (or location, or reputation) causes high demand → high demand causes long wait times. The researcher is inferring that the effect (wait time) causes the antecedent condition (demand), when in fact demand came first. Temporal order doesn't resolve this — customers observe the wait and decide to stay, but the wait was created by prior demand. Option D exemplifies the fallacy of confusing temporal precedence with causation.
Question 3 True / False
If variable X consistently occurs before variable Y in time, this is sufficient evidence that X causes Y.
TTrue
FFalse
Answer: False
Temporal precedence is necessary for causation (causes must precede effects) but not sufficient. A confounding variable Z could cause both X and Y while ensuring X appears first. Example: seasonal change causes both a temperature drop (X) and a subsequent rise in flu cases (Y) — X precedes Y, but neither causes the other. 'After this, therefore because of this' (post hoc ergo propter hoc) is a named fallacy precisely because temporal order alone proves nothing about causation.
Question 4 True / False
A correlation of zero between X and Y guarantees that X and Y have no relationship of any kind.
TTrue
FFalse
Answer: False
Pearson's r measures linear association specifically. A perfect non-linear relationship — for example, Y = X² — can produce a correlation of exactly zero because the positive and negative contributions cancel out. 'No linear correlation' is not the same as 'no relationship.' Two variables can be strongly dependent while showing r ≈ 0, which is why zero correlation should not be interpreted as statistical independence or as absence of a relationship.
Question 5 Short Answer
What three questions should you ask when evaluating a claimed causal relationship from an observed correlation?
Think about your answer, then reveal below.
Model answer: (1) Could a confounding variable independently cause both X and Y, creating a spurious association? (2) Could causation run in the opposite direction — does Y actually cause X? (3) Could the correlation be coincidental — a product of chance in a large dataset with no underlying causal connection? Ruling out all three strengthens a causal claim, but definitively doing so typically requires randomization or a quasi-experimental design.
These three questions operationalize the gap between correlation and causation. Most claims about causation in the wild — health studies, social science findings, business analytics — fail to adequately address at least one of these threats. The bar for claiming causation is much higher than the bar for observing correlation, and training yourself to reflexively ask these questions is the core skill this topic develops.