Causal Inference in Machine Learning

Research Depth 73 in the knowledge graph I know this Set as goal
Unlocks 1 downstream topic
causal-inference causal-graphs treatment-effects confounding intervention

Core Idea

Causal inference in machine learning goes beyond correlation to identify cause-effect relationships: "If we intervene to change X, how will Y change?" This is formalized through causal graphs (directed acyclic graphs representing causal assumptions), do-calculus (Pearl's framework for computing interventional distributions), and randomized experiments (gold standard but often infeasible). Machine learning approaches use observational data with causal assumptions to estimate causal effects, addressing confounding (variables that influence both cause and effect), selection bias, and unobserved confounders. Applications include treatment effect estimation, policy evaluation, and counterfactual prediction.

Explainer

Causal inference is the science of learning cause-effect relationships from data. In machine learning, this emerges as a critical challenge: when you train a model on observational data, are you learning correlation or causation? This distinction is crucial for applications like medical treatment (does this drug help patients?), policy evaluation (does this intervention improve outcomes?), and counterfactual reasoning (what would happen if we changed a decision?).

The Causal Graph Framework: Pearl's causal framework represents causal assumptions as directed acyclic graphs (DAGs). Nodes are variables; directed edges represent causal influences. For example, a treatment T causes outcome Y, and a confounder C causes both T and Y. The graph encodes the causal structure and enables formal reasoning about which variables must be controlled to isolate causal effects.

Do-Calculus: Pearl's do-calculus provides rules for computing interventional distributions P(Y|do(X)) — the probability of Y if we intervene to set X — from observational distributions P(Y|X). The do-operator is key: P(Y|do(X)=x) differs from P(Y|X=x) when confounders exist. Do-calculus formalizes three rules:

1. Ignore observations: P(Y|do(X), Z, W) = P(Y|do(X), W) if Z is not a descendant of X.

2. Ignore interventions: P(Y|do(X), do(Z), W) = P(Y|do(X), W) if there is no causal path from Z to Y given X.

3. Ignore interventions and observations: Complex rule for ignoring variables when confounding is broken.

These rules allow converting do-expressions to observation-based quantities, enabling estimation from observational data.

Confounding and Bias: A confounder is a variable that influences both treatment and outcome. Confounders induce spurious correlation: if C causes both T and Y, then T and Y are correlated even without a causal effect of T on Y. Causal methods address confounding through:

Identifiability: A causal effect is identifiable if it can be computed from the observational distribution and the causal graph. The backdoor criterion (Pearl) provides a sufficient condition: the causal effect of T on Y is identifiable if there exists a set of confounders C such that (1) C blocks all non-causal paths from T to Y (backdoor paths), and (2) no element of C is a descendant of T. If the backdoor criterion is satisfied, the causal effect is identifiable by conditioning on C.

Unobserved Confounding: If unmeasured confounders exist, the causal effect is not identifiable from observational data alone, even with a known causal graph. Alternative strategies:

Machine Learning Integration: Modern causal ML combines machine learning with causal inference:

Practical Challenges:

Applications:

Causal inference is an increasingly critical capability as ML systems move from prediction (does X predict Y?) to decision-making (if we do X, what happens to Y?). Practitioners must understand both the power and limitations of causal methods.

Practice Questions 4 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesAngle Pairs: Complementary, Supplementary, and VerticalParallel Lines and TransversalsCorresponding AnglesAlternate Interior AnglesTriangle Angle Sum TheoremExterior Angle TheoremTriangle Inequality TheoremSimilar Triangles: AA SimilaritySimilar Triangles: SSS and SAS SimilarityProportions in Similar TrianglesRight Triangle Trigonometry IntroductionTrigonometric Ratios ReviewRadian MeasureConverting Between Degrees and RadiansThe Unit CircleGraphing Sine and CosineGraphing Tangent and Reciprocal Trigonometric FunctionsDerivatives of Trigonometric FunctionsAntiderivativesIndefinite IntegralsBasic Integration RulesRiemann SumsDefinite Integral DefinitionProbability Density Functions and Continuous DistributionsBayesian Statistics: Prior, Posterior, Credible IntervalsIntroduction to Bayesian InferenceCausal Inference in Machine Learning

Longest path: 74 steps · 403 total prerequisite topics

Prerequisites (2)

Leads To (1)