← Graph View All Domains

A topic in the Open Knowledge Graph — a free, open map of 15,290 topics and the order to learn them in.

Causal Inference in Machine Learning

Research Depth 101 in the knowledge graph ☐ I know this ☆ Set as goal

1topic build on this

669prerequisites beneath it

See this on the map →

Introduction to Bayesian Inference Probabilistic Graphical Models→→Fairness in Machine Learning Theory

Core Idea

Causal inference in machine learning goes beyond correlation to identify cause-effect relationships: "If we intervene to change X, how will Y change?" This is formalized through causal graphs (directed acyclic graphs representing causal assumptions), do-calculus (Pearl's framework for computing interventional distributions), and randomized experiments (gold standard but often infeasible). Machine learning approaches use observational data with causal assumptions to estimate causal effects, addressing confounding (variables that influence both cause and effect), selection bias, and unobserved confounders. Applications include treatment effect estimation, policy evaluation, and counterfactual prediction.

Explainer

Causal inference is the science of learning cause-effect relationships from data. In machine learning, this emerges as a critical challenge: when you train a model on observational data, are you learning correlation or causation? This distinction is crucial for applications like medical treatment (does this drug help patients?), policy evaluation (does this intervention improve outcomes?), and counterfactual reasoning (what would happen if we changed a decision?).

The Causal Graph Framework: Pearl's causal framework represents causal assumptions as directed acyclic graphs (DAGs). Nodes are variables; directed edges represent causal influences. For example, a treatment T causes outcome Y, and a confounder C causes both T and Y. The graph encodes the causal structure and enables formal reasoning about which variables must be controlled to isolate causal effects.

Do-Calculus: Pearl's do-calculus provides rules for computing interventional distributions P(Y|do(X)) — the probability of Y if we intervene to set X — from observational distributions P(Y|X). The do-operator is key: P(Y|do(X)=x) differs from P(Y|X=x) when confounders exist. Do-calculus formalizes three rules:

1. Ignore observations: P(Y|do(X), Z, W) = P(Y|do(X), W) if Z is not a descendant of X.

2. Ignore interventions: P(Y|do(X), do(Z), W) = P(Y|do(X), W) if there is no causal path from Z to Y given X.

3. Ignore interventions and observations: Complex rule for ignoring variables when confounding is broken.

These rules allow converting do-expressions to observation-based quantities, enabling estimation from observational data.

Confounding and Bias: A confounder is a variable that influences both treatment and outcome. Confounders induce spurious correlation: if C causes both T and Y, then T and Y are correlated even without a causal effect of T on Y. Causal methods address confounding through:

Conditioning: Stratify by confounder value, isolating the causal effect within strata.
Matching: Create matched pairs of treated/untreated units with similar confounder values.
Regression: Include confounders as covariates (works for linear models and some non-linear settings).
Inverse Probability Weighting: Reweight observations by inverse propensity score, creating a pseudo-population where treatment is independent of confounders.
Doubly Robust Methods: Combine regression and weighting for robustness.

Identifiability: A causal effect is identifiable if it can be computed from the observational distribution and the causal graph. The backdoor criterion (Pearl) provides a sufficient condition: the causal effect of T on Y is identifiable if there exists a set of confounders C such that (1) C blocks all non-causal paths from T to Y (backdoor paths), and (2) no element of C is a descendant of T. If the backdoor criterion is satisfied, the causal effect is identifiable by conditioning on C.

Unobserved Confounding: If unmeasured confounders exist, the causal effect is not identifiable from observational data alone, even with a known causal graph. Alternative strategies:

Instrumental Variables: Variables that affect treatment but only through treatment's effect on outcome, enabling causal effect estimation without measuring confounders.
Regression Discontinuity: When treatment assignment has a threshold, the discontinuity at the threshold identifies causal effects near the threshold.
Synthetic Controls: Construct control units from pre-intervention outcomes to estimate counterfactual outcomes.
Sensitivity Analysis: Explore how conclusions change under different levels of unmeasured confounding.

Machine Learning Integration: Modern causal ML combines machine learning with causal inference:

Heterogeneous Treatment Effects: Use ML to learn how treatment effects vary across subgroups (e.g., which patients benefit from a drug?).
Causal Discovery: Use algorithms to learn causal structure from data (challenging; requires strong assumptions and often fails without sufficient data or domain knowledge).
Double ML: Combine machine learning for nuisance parameter estimation (e.g., propensity scores) with causal inference for treatment effects.
Causal Forests: Ensemble methods that estimate heterogeneous causal effects by splitting data on causal effect heterogeneity.

Practical Challenges:

Causal assumptions (the causal graph) are usually not known and must be justified based on domain knowledge.
Unmeasured confounders are always possible in observational studies.
Estimating causal effects requires careful balance of bias and variance; naive estimators can be biased or inefficient.
Causal effects in complex domains (social systems, economics) are often heterogeneous and context-dependent.

Applications:

Medicine: Estimating treatment effects from observational patient data.
Economics: Evaluating policy interventions (minimum wage, education programs).
Recommendation Systems: Understanding causal effects of recommendations on user outcomes (vs. just correlation).
Marketing: Measuring incremental impact of campaigns while controlling for confounding.

Causal inference is an increasingly critical capability as ML systems move from prediction (does X predict Y?) to decision-making (if we do X, what happens to Y?). Practitioners must understand both the power and limitations of causal methods.

Practice Questions 4 questions