Graph neural networks (GNNs) extend deep learning to graph-structured data by recursively updating node representations through aggregation of neighbor information. GNN theory addresses expressiveness: what graph properties can GNNs learn to recognize? The Weisfeiler-Lehman test provides a theoretical bound on GNN expressiveness — GNNs cannot distinguish graphs that the WL test cannot distinguish. Higher-order GNNs (using k-WL test) increase expressiveness at computational cost. Spectral GNNs connect graph convolutions to spectral filtering on graphs. Message-passing frameworks unify various GNN architectures and enable analysis of their properties, including over-squashing (information bottleneck in graph structure) and over-smoothing (layer-wise feature collapse).
Graph neural networks extend deep learning to graph-structured data by recursively updating node representations through message passing. Unlike regular neural networks that assume fixed-size, grid-like inputs, GNNs handle variable-size graphs with arbitrary structure.
Message-Passing Framework: The fundamental operation is:
h_i^{(l+1)} = AGGREGATE( {h_j^{(l)} : j in neighbors(i)} )
At each layer l, each node's representation h_i^{(l+1)} is updated by aggregating information from neighbors' previous representations. Aggregation can be mean, sum, or learned (attention-based). This is broadcast to all nodes in parallel, then aggregated.
Expressiveness and the Weisfeiler-Lehman Test: The WL test is a classical algorithm for checking graph isomorphism by iteratively refining node labels based on neighborhood structure. It is known to distinguish most graphs but not all. A breakthrough result (Xu et al., Morris et al.) shows that message-passing GNNs are exactly as expressive as the WL test. That is:
This provides a theoretical characterization: GNNs cannot compute functions that WL cannot compute. For tasks requiring WL-expressiveness, standard message-passing suffices; for tasks requiring higher discrimination, higher-order GNNs or structural features are needed.
Spectral Methods: Spectral GNNs interpret convolution as filtering in the Fourier basis of the graph Laplacian. A spectral convolution is:
h_i^{(l+1)} = sum_k theta_k * lambda_k^l * phi_k(i)
where lambda_k are eigenvalues and phi_k are eigenvectors of the Laplacian. Computing exact spectral convolutions requires eigendecomposition (expensive). Efficient approximations use Chebyshev polynomials: approximate the spectral filter with a polynomial of the Laplacian, which can be computed recursively via matrix multiplication with the adjacency matrix. This polyomial approximation amounts to k-hop neighborhood aggregation, connecting spectral and spatial views.
Over-Smoothing: A critical challenge is that stacking many layers causes node representations to become similar. Mathematically, the representation at layer L is influenced by neighbors within distance L. In large graphs, this can include all nodes, causing representations to converge toward a global average. Over-smoothing limits the effective depth of GNNs, with practical networks often limited to 2-4 layers despite much deeper success in CNNs and Transformers. Mitigations include:
Over-Squashing: Information-theoretic bottleneck in GNNs. Narrow, highly connected graphs can cause information from distant regions to be compressed through a bottleneck. For example, in a long chain of nodes, information from the left end must flow through a single edge to reach the right end, compressing high-dimensional representations into a 1D communication channel. This "over-squashing" limits GNN effectiveness on certain graph topologies. Addressing over-squashing requires higher-order GNNs or structure-aware designs.
GNN Architectures: Common designs include:
Theoretical Results:
1. Universal Approximation: GNNs can approximate any permutation-invariant function on graphs with sufficient expressiveness.
2. Generalization Bounds: GNN generalization depends on graph structure, feature dimension, and model capacity via uniform convergence and Rademacher complexity.
3. Implicit Regularization: Like other neural networks, GNNs exhibit implicit bias toward sparse, interpretable solutions through SGD and initialization.
Practical Challenges:
Emerging Directions:
Graph neural networks are crucial for applications in chemistry (molecular property prediction), social networks, recommendation systems, and scientific simulation. Understanding their theoretical properties and limitations is essential for designing effective models for complex real-world graphs.
No topics depend on this one yet.