Explain why observational data (where variables are passively observed) can yield different conditional independence statements than interventional data (where variables are actively set to specific values). What does this imply for causal discovery?
Think about your answer, then reveal below.
Model answer: In observational data, variables take their natural values determined by their causal parents and noise. The conditional independences that hold (as given by d-separation) reflect the causal graph structure. However, if a variable X has an unobserved confounder U affecting both X and another variable Y, the observational distribution will show dependence between X and Y even though X does not directly cause Y. Intervention breaks this: if we intervene to set X to a specific value, we sever X's dependence on U, and the conditional distribution of Y given the intervention reflects only X's direct causal effect. Mathematically, P(Y | do(X=x)) differs from P(Y | X=x) when confounders exist. This implies that causal discovery from pure observational data is difficult and requires strong assumptions (no hidden confounders, causal sufficiency). With access to interventional data, causal direction can be determined more reliably. In practice, causal inference from observational data uses sensitivity analyses (how robust are conclusions to potential hidden confounders) and causal assumptions encoded in graphical models.
The distinction between P(Y|X) (observational) and P(Y|do(X)) (interventional) is fundamental. Causal graphs encode assumptions about confounding, and d-separation in the graph determines when observational equivalence holds (different graphs have the same observational distribution). Discovering the true causal structure from observational data alone is an identifiability problem — multiple graphs may be compatible with the data, requiring domain knowledge or additional assumptions to resolve.