A Bayesian network models 20 binary variables. How does the storage requirement of the network compare to storing the full joint probability distribution?
ABoth require the same storage, since they encode equivalent probabilistic information about the 20 variables
BThe full joint requires up to 2²⁰ entries; the network requires only the sum of CPT entries (one per variable per combination of parent states), typically orders of magnitude fewer
CThe Bayesian network requires more storage because it must also store the graph structure, edge weights, and metadata
DBoth require exactly 20 parameters, one marginal probability per variable
This is the core efficiency argument for Bayesian networks. A full joint distribution over 20 binary variables requires 2²⁰ ≈ 1 million entries. A Bayesian network exploits conditional independence: a variable with k parents requires only 2^k entries in its CPT, not 2^(number of all other variables). If most variables have few parents (sparse graph), the total CPT entries can be in the hundreds rather than millions. The savings become even more dramatic with more variables — a 50-variable full joint would need 2⁵⁰ entries, while a sparse network might need only a few thousand.
Question 2 Multiple Choice
In a medical Bayesian network, you observe that a patient has both a cough and a fever. You want to compute P(Flu | Cough=true, Fever=true). What does exact inference require?
ASimply reading the prior probability of flu from the Flu node's marginal distribution — observations don't change priors in a static network
BSumming out all unobserved variables to obtain the posterior probability, weighting each configuration by its probability given the evidence
CMultiplying all CPT entries together and normalizing — inference is a single multiplication step
DRunning Monte Carlo simulation, since exact computation is always intractable in any network with more than 10 nodes
Inference in a Bayesian network requires computing a conditional distribution, which means summing (marginalizing) over all possible states of unobserved variables. P(Flu | evidence) ∝ Σ_{hidden} P(Flu, hidden, evidence), where the sum is over all combinations of hidden variable values. For tree-structured networks this can be done efficiently via belief propagation in two passes; for general networks, algorithms like variable elimination do it systematically. The prior (option A) ignores the evidence entirely. Option D overstates the difficulty — exact inference is tractable for many practical network structures.
Question 3 True / False
In a Bayesian network, a variable is conditionally independent of MOST other variables in the network once you observe its direct parent nodes.
TTrue
FFalse
Answer: False
A variable is conditionally independent of its non-descendants given its parents — not of all other variables. Descendants can still carry evidence that propagates back up the network and affects probabilities. More subtly, observing a common child of two parent nodes creates a dependency between those parents that didn't exist before — the 'explaining away' effect. For example, if Flu and Allergies both cause Cough, and you observe Cough=true, Flu and Allergies become negatively correlated even though they were independent a priori. The correct independence structure is determined by d-separation rules, not simply 'observed parents block everything.'
Question 4 True / False
The efficiency of Bayesian networks comes from assuming that most variables are conditionally independent of most other variables given their parents, allowing the joint distribution to factor into a product of local conditional probability tables.
TTrue
FFalse
Answer: True
This is exactly the key factorization: P(X₁, ..., Xₙ) = ∏ P(Xᵢ | parents(Xᵢ)). This works because each variable's CPT captures only the dependencies that actually exist. The number of parameters needed is the sum of CPT sizes, which is small when the graph is sparse (few parents per node). Crucially, the factorization is not an approximation — it is exact for any distribution that is consistent with the conditional independence assumptions encoded in the graph structure. Adding more edges (more dependencies) increases storage; removing unjustified independence assumptions makes the model less efficient but more accurate.
Question 5 Short Answer
Observing evidence about one variable in a Bayesian network can change the probability of variables not directly connected to it in the graph. Explain why, using a concrete example.
Think about your answer, then reveal below.
Model answer: Evidence propagates through the network via shared connections. In a structure where Flu and Allergies both cause Cough (a common-cause structure), Flu and Allergies are marginally independent — knowing someone has flu tells you nothing about whether they have allergies. But if you observe Cough=true, the two causes become negatively correlated: learning the patient has allergies reduces the probability that flu explains their cough (explaining away). This dependency travels through the Cough node even though there is no direct edge between Flu and Allergies. More generally, evidence can propagate in any direction through the network — up to parents, down to descendants, or laterally through shared effects — governed by d-separation rules.
This is one of the most counterintuitive aspects of probabilistic graphical models. Dependencies are not just structural (from edges) but also evidential (from observations). Observing a node can 'activate' dependencies between its parents that were previously blocked, and can block dependencies between nodes that were previously connected. The d-separation framework formalizes exactly when observing a set of variables blocks or activates an information path between two other variables, allowing you to read off conditional independence relationships directly from the graph structure.