A student wants to find the expected number of students in a class of 30 who share a birthday with at least one classmate. She reasons: 'I can't use linearity of expectation because whether any two students share a birthday depends on all the other birthdays — the indicator variables are not independent.' Is she correct?
AYes — linearity of expectation requires independence, so this approach would give incorrect results
BNo — linearity of expectation holds even when the random variables being summed are dependent
CPartially — you can use linearity only after conditioning on the most likely birthday
DYes — you must first compute the full joint distribution before applying linearity
The student's reasoning applies to variance (where Var(X+Y) = Var(X)+Var(Y) only when X and Y are independent), not to expectation. Linearity of expectation is unconditional: E[X₁+X₂+...+Xₙ] = E[X₁]+E[X₂]+...+E[Xₙ] always holds, regardless of dependence. This is the key distinction — and the source of linearity's power in combinatorics.
Question 2 Multiple Choice
You want to find the expected number of edges in a random subgraph where each of m edges is independently included with probability p. Which approach correctly uses linearity of expectation?
ASum p^k · (1-p)^(m-k) · C(m,k) · k over all k from 0 to m
BDefine Xₑ = 1 if edge e is included, then E[total edges] = Σₑ E[Xₑ] = m·p
CUse linearity only after verifying that the edge indicators are independent of each other
DCompute the variance first to check whether dependence is small enough to ignore
Option A is the direct computation via the binomial distribution — valid but unnecessarily complex. Option B applies linearity correctly: define one indicator variable per edge, note each has expectation p, sum by linearity to get m·p immediately. Crucially, option C is wrong — linearity does not require independence. In this particular problem the edges are independent, but that fact is irrelevant to whether linearity applies; it applies either way.
Question 3 True / False
For any random variable X that can be written as a sum of indicator random variables, E[X] equals the sum of the probabilities of each indicator event, regardless of whether those events are independent.
TTrue
FFalse
Answer: True
This is the direct application of linearity of expectation: E[Xᵢ] = P(event i occurs) = pᵢ for an indicator variable, so E[ΣXᵢ] = Σpᵢ by linearity. No independence assumption is needed anywhere in this chain. This is exactly why the technique is so powerful — you reduce a hard counting problem to summing easy individual probabilities, even when those events are correlated.
Question 4 True / False
Because variance has the property Var(X+Y) = Var(X) + Var(Y) mainly when X and Y are independent, the same independence restriction applies to the linearity of expectation.
TTrue
FFalse
Answer: False
This is the critical distinction. Variance's additivity requires independence; expectation's additivity does not. Linearity of expectation follows from the definition of expectation as a weighted sum (or integral), which is linear as a mathematical operation — unconditionally. Confusing these two properties is the most common misconception about linearity of expectation. Remembering that 'expectation is linear, variance is not (without independence)' is essential.
Question 5 Short Answer
Explain why linearity of expectation is more powerful than it initially seems. Specifically: what makes it different from the corresponding property of variance, and why does this difference matter for counting problems?
Think about your answer, then reveal below.
Model answer: Linearity of expectation holds even for dependent random variables — E[X+Y] = E[X]+E[Y] always. The analogous property for variance (Var(X+Y) = Var(X)+Var(Y)) requires independence. This matters for counting because indicator variables in combinatorial problems are almost always dependent (e.g., whether element i is a fixed point depends on where all other elements map). If linearity required independence, we could rarely use it. Because it doesn't, we can always decompose a complex count into many simple indicator probabilities and sum them — even when those indicators are tangled together in complicated ways.
The fixed-point example illustrates this perfectly: in a random permutation of n elements, the expected number of fixed points is 1 for all n. The indicator variables Xᵢ (is element i a fixed point?) are dependent — knowing that element 1 is fixed tells you something about whether element 2 is fixed. But E[total fixed points] = n·(1/n) = 1 via linearity regardless. Trying to compute this by summing over all n! permutations would be far more work.