A Bayesian network models 20 binary variables, where each variable has at most 3 parents. Compared to storing the full joint distribution, how does the network's storage requirement compare?
ARoughly the same, because the network must still implicitly represent all 2^20 possible variable combinations
BExponentially smaller, because each variable's conditional probability table depends only on its parents, not all other variables
CSlightly smaller, because the network removes redundant variables that are fully determined by others
DLarger, because storing the graph structure plus all conditional tables requires more space than a flat lookup table
The full joint distribution over 20 binary variables requires 2^20 ≈ 1 million entries. With at most 3 parents, each variable's CPT has at most 2^3 = 8 rows × 2 columns = 16 entries, and there are 20 variables, for roughly 320 entries total. The savings come from conditional independence: once you know a variable's parents, knowing other variables adds no information about it. The factorization P(X₁,...,X₂₀) = ∏ P(Xᵢ | Parents(Xᵢ)) decomposes the exponentially large joint into a product of small local tables — this exponential savings in representation is the central reason PGMs are computationally tractable.
Question 2 Multiple Choice
In a Bayesian network, variable X directly causes variable Y (X → Y). A new observation is made for variable Z, a distant descendant of Y. How does observing Z affect our beliefs about X?
AObserving Z does not change beliefs about X because the causal arrow points away from X, not toward it
BObserving Z updates beliefs about Y, and this change propagates up the causal chain to update beliefs about X
CObserving Z makes X and Y conditionally independent, because Z screens off Y from X
DObserving Z increases uncertainty about X because the downstream observation introduces conflicting information
In Bayesian networks, belief propagation flows in both directions, regardless of the direction of causal arrows. Observing Z tells us something about Y's likely value (since Z is caused by Y), and knowing more about Y tells us something about X's likely value (since X causes Y). This 'explaining away' or 'backward inference' is the core of Bayesian reasoning — causes and effects update each other bidirectionally. Option A is the classic misconception: thinking causal arrows block inference from going 'backward.' In probability, causation tells you the model structure; inference flows wherever the evidence leads.
Question 3 True / False
The joint probability distribution in a Bayesian network factorizes as the product of each variable's conditional probability given its parents alone — this is what makes the network a compact representation.
TTrue
FFalse
Answer: True
This factorization P(X₁,...,Xₙ) = ∏ P(Xᵢ | Parents(Xᵢ)) is the defining property of a Bayesian network. It is valid precisely because the graph encodes conditional independence: each variable is independent of its non-descendants given its parents. Without this independence structure, the factorization would not hold and you would need the full joint distribution. The graph structure is therefore not just a visualization — it is a formal assertion about which conditional independencies hold in the domain, and the factorization is the direct computational payoff of those assertions.
Question 4 True / False
Adding more edges to a Bayesian network generally makes inference more efficient, because more connections allow information to flow more directly between related variables.
TTrue
FFalse
Answer: False
More edges mean more conditional dependencies and fewer conditional independencies. Each additional edge can enlarge the parent sets of variables, increasing the size of their conditional probability tables exponentially (a variable with k parents has a CPT of size 2^k for binary variables). It also complicates the factorization: the joint distribution can be broken into fewer, larger local terms rather than many small ones. Efficient inference algorithms like variable elimination and belief propagation exploit sparsity — the fact that most variables are independent of most others. A densely connected graph is harder to reason about, not easier.
Question 5 Short Answer
Why do probabilistic graphical models represent joint distributions much more compactly than explicitly enumerating all combinations, and what role does the graph structure play in enabling this?
Think about your answer, then reveal below.
Model answer: The joint distribution over n variables takes exponential space to enumerate explicitly. A PGM represents the distribution compactly by encoding conditional independence relationships in the graph: each variable depends only on its neighbors (parents in a directed graph, clique members in an undirected graph), not on all other variables. The joint distribution factorizes into a product of small local terms, one per variable (or clique), each involving only a small subset of variables. The graph structure specifies exactly which variables each local term involves, making the factorization possible and turning an exponential representation problem into a polynomial one.
The key insight is that the graph is not just a picture of the model — it is a formal specification of conditional independencies, and those independencies are what enable the factorization. Without the independence structure, no compact representation is possible. Understanding this connection — graph topology → conditional independence → factorization → tractable representation and inference — is the conceptual core of PGMs.