The Weisfeiler-Lehman (WL) test is used to bound the expressiveness of Graph Neural Networks. What does WL expressiveness tell you about a GNN?
Think about your answer, then reveal below.
Model answer: A GNN can distinguish two graphs only if the WL test distinguishes them. Equivalently, if the WL test cannot distinguish two graphs, no message-passing GNN can learn a function that produces different outputs for them, regardless of architecture or training. This provides a theoretical upper bound on GNN expressiveness. The implication: GNNs cannot solve graph isomorphism (which is known to be hard for the WL test), cannot distinguish certain non-isomorphic graphs, and have inherent limitations on what graph properties they can compute. To overcome WL limitations, you need higher-order GNNs (using k-dimensional Weisfeiler-Lehman) or structural features beyond message-passing.
WL expressiveness is a fundamental constraint on message-passing GNNs, connecting learning theory to algebraic graph theory. Understanding these limits guides architecture design: if your task requires distinguishing graphs the WL test cannot distinguish, message-passing GNNs will fail, and you need alternatives.
Question 2 Multiple Choice
Over-smoothing is a common problem in deep GNNs: as you stack more layers, node representations become increasingly similar to each other. Why does this happen?
AOver-smoothing is unrelated to depth; it is a hyperparameter tuning issue
BMessage passing aggregates neighbor information; in deep networks, repeated aggregation causes all nodes to converge to a global average representation, erasing node-specific distinctions
COver-smoothing only happens with poorly chosen aggregation functions; better aggregation prevents it
DOver-smoothing is necessary for generalization; nodes should be similar to ensure good test performance
Message passing iteratively aggregates neighbor information. After L layers, each node's representation is influenced by all nodes within distance L. In dense graphs or infinite-layer limits, all nodes become influenced by all others, causing their representations to converge to a global average. This erases node-specific structure, limiting expressiveness. Over-smoothing is a fundamental property of message-passing depth, not a tuning issue. Mitigations include residual connections, skip connections, and careful layer normalization design.
Question 3 Multiple Choice
Spectral GNNs compute convolutions using spectral decomposition of the graph Laplacian. How does spectral convolution relate to spatial message passing?
ASpectral and spatial GNNs are completely different; they solve different problems
BSpectral convolution in the Fourier domain corresponds to localized spatial filtering (message passing); ChebNet approximates spectral convolution with a polynomial of the adjacency matrix, equivalent to k-hop aggregation
CSpectral methods are only for undirected graphs, while spatial methods work for all graphs
DSpectral convolution is slower than spatial, so spatial message passing is always preferred
Spectral and spatial GNNs are intimately connected. A spectral convolution (filtering in the Fourier basis of the Laplacian) has a spatial interpretation: it performs weighted aggregation from neighbors. Chebyshev polynomial approximation of spectral filters amounts to k-hop neighborhood aggregation, directly connecting to message passing. This duality allows designing GNNs in either domain: spectral design provides theoretical intuition (what frequencies are filtered), while spatial implementation is computationally efficient.
Question 4 True / False
Higher-order GNNs use k-dimensional Weisfeiler-Lehman tests to improve expressiveness over standard message-passing GNNs. What is the computational tradeoff?
TTrue
FFalse
Answer: True
Higher-order GNNs (k-GNNs) increase expressiveness by aggregating information from larger substructures (k-tuples of nodes) rather than individual nodes and edges. This provides better discrimination of graph structures and overcomes some WL limitations. However, the computational cost increases dramatically: k-WL requires enumerating and processing k-tuples, which is exponential in k. For moderate k (2-3), this is tractable; for large k, it becomes prohibitive. This tradeoff between expressiveness and computational cost is fundamental: you can increase expressive power by looking at larger structures, but pay in complexity. In practice, k=2 (edge-level features) is standard; higher k is used selectively when expressiveness is critical and compute is available.