Sigma-algebras ℊ and ℋ are independent if P(A ∩ B) = P(A)P(B) for all A ∈ ℊ, B ∈ ℋ. Random variables X and Y are independent if their generated sigma-algebras are independent. This definition applies equally to discrete, continuous, and singular distributions.
You learned in an earlier course that two events A and B are independent when knowing one occurs gives no information about the other — formally, P(A ∩ B) = P(A)P(B). From your study of conditional expectation, you now have a richer language: E[X | ℱ] is the best prediction of X given all information in the sigma-algebra ℱ. Independence of sigma-algebras is the natural generalization that lets you say "the information in ℊ gives no information about events in ℋ."
A sigma-algebra generated by a random variable X, written σ(X), is the collection of all events of the form {X ∈ B} for Borel sets B. It captures everything you could observe about X: any question you can ask about X (is X > 3? is X in [1, 2]? is X rational?) corresponds to some event in σ(X). Two random variables X and Y are independent when σ(X) and σ(Y) are independent sigma-algebras — meaning P({X ∈ A} ∩ {Y ∈ B}) = P(X ∈ A) · P(Y ∈ B) for all Borel sets A and B. This is a single definition that unifies independence for discrete, continuous, and mixed distributions without needing separate cases.
Why does the measure-theoretic definition matter? Consider the alternative — defining independence by joint PMFs or joint PDFs. For discrete random variables you write P(X = x, Y = y) = P(X = x)P(Y = y); for continuous ones you require f_{X,Y}(x,y) = f_X(x)f_Y(y). These case-by-case definitions work in their domains but break down for singular distributions or mixed types. The sigma-algebra definition is universal: it requires the product rule P(A ∩ B) = P(A)P(B) to hold for all observable events in both sigma-algebras, regardless of whether those events are described by mass functions, density functions, or neither.
The connection to conditional expectation is particularly clean: X and Y are independent if and only if E[f(X) | σ(Y)] = E[f(X)] for all measurable f — that is, knowing Y does not change your expectation of any function of X. This equivalence links independence to information, which is the right conceptual framing for probability theory. It also sets up the law of large numbers: when X₁, X₂, … are independent, their sigma-algebras carry no mutual information, and this is what allows the sample average to converge to the true mean with probability 1.
No topics depend on this one yet.