Causal inference in machine learning goes beyond correlation to identify cause-effect relationships: "If we intervene to change X, how will Y change?" This is formalized through causal graphs (directed acyclic graphs representing causal assumptions), do-calculus (Pearl's framework for computing interventional distributions), and randomized experiments (gold standard but often infeasible). Machine learning approaches use observational data with causal assumptions to estimate causal effects, addressing confounding (variables that influence both cause and effect), selection bias, and unobserved confounders. Applications include treatment effect estimation, policy evaluation, and counterfactual prediction.
Causal inference is the science of learning cause-effect relationships from data. In machine learning, this emerges as a critical challenge: when you train a model on observational data, are you learning correlation or causation? This distinction is crucial for applications like medical treatment (does this drug help patients?), policy evaluation (does this intervention improve outcomes?), and counterfactual reasoning (what would happen if we changed a decision?).
The Causal Graph Framework: Pearl's causal framework represents causal assumptions as directed acyclic graphs (DAGs). Nodes are variables; directed edges represent causal influences. For example, a treatment T causes outcome Y, and a confounder C causes both T and Y. The graph encodes the causal structure and enables formal reasoning about which variables must be controlled to isolate causal effects.
Do-Calculus: Pearl's do-calculus provides rules for computing interventional distributions P(Y|do(X)) — the probability of Y if we intervene to set X — from observational distributions P(Y|X). The do-operator is key: P(Y|do(X)=x) differs from P(Y|X=x) when confounders exist. Do-calculus formalizes three rules:
1. Ignore observations: P(Y|do(X), Z, W) = P(Y|do(X), W) if Z is not a descendant of X.
2. Ignore interventions: P(Y|do(X), do(Z), W) = P(Y|do(X), W) if there is no causal path from Z to Y given X.
3. Ignore interventions and observations: Complex rule for ignoring variables when confounding is broken.
These rules allow converting do-expressions to observation-based quantities, enabling estimation from observational data.
Confounding and Bias: A confounder is a variable that influences both treatment and outcome. Confounders induce spurious correlation: if C causes both T and Y, then T and Y are correlated even without a causal effect of T on Y. Causal methods address confounding through:
Identifiability: A causal effect is identifiable if it can be computed from the observational distribution and the causal graph. The backdoor criterion (Pearl) provides a sufficient condition: the causal effect of T on Y is identifiable if there exists a set of confounders C such that (1) C blocks all non-causal paths from T to Y (backdoor paths), and (2) no element of C is a descendant of T. If the backdoor criterion is satisfied, the causal effect is identifiable by conditioning on C.
Unobserved Confounding: If unmeasured confounders exist, the causal effect is not identifiable from observational data alone, even with a known causal graph. Alternative strategies:
Machine Learning Integration: Modern causal ML combines machine learning with causal inference:
Practical Challenges:
Applications:
Causal inference is an increasingly critical capability as ML systems move from prediction (does X predict Y?) to decision-making (if we do X, what happens to Y?). Practitioners must understand both the power and limitations of causal methods.