Information Geometry Basics

Research Depth 78 in the knowledge graph I know this Set as goal
Unlocks 2 downstream topics
information geometry Fisher metric Riemannian manifold statistical manifold dual connections Amari

Core Idea

Information geometry treats families of probability distributions as points on a Riemannian manifold, with the Fisher information matrix as the metric tensor. This geometric perspective reveals that statistical inference is fundamentally a geometric operation: maximum likelihood estimation finds the nearest point on a model manifold, the EM algorithm alternates between two dual projections, and exponential families are flat submanifolds. The dual connection structure (e-connection and m-connection) discovered by Amari captures the asymmetry of KL divergence geometrically. Information geometry unifies concepts from statistics, information theory, and differential geometry, providing deep structural insights into optimization, machine learning, and neural networks.

Explainer

Probability distributions have a natural geometric structure. Consider the set of all Bernoulli distributions parameterized by p in (0,1). In the usual Euclidean view, this is a line segment. But from an information-theoretic perspective, distributions near p = 0 or p = 1 are packed more tightly — small changes in p create large changes in the distribution when p is extreme. The Fisher information I(p) = 1/(p(1-p)) captures this: the "information-theoretic distance" between p and p + dp is sqrt(I(p)) * dp, which is large near the boundaries. Information geometry makes this rigorous by using I(p) as a Riemannian metric.

For a parametric family {f(x; theta) : theta in Theta}, the Fisher information matrix g_{ij}(theta) = E[(d/d_theta_i log f)(d/d_theta_j log f)] serves as the metric tensor. The geodesic distance between nearby distributions f(x; theta) and f(x; theta + d_theta) is ds^2 = sum g_{ij} d_theta_i d_theta_j. This distance is invariant under reparameterization — by Cencov's theorem, it is the unique natural metric on statistical manifolds. Two distributions that look close in one parameterization but far in another are correctly measured by the Fisher metric regardless.

The deepest insight of information geometry is the dual connection structure. On a standard Riemannian manifold, there is one natural connection (the Levi-Civita connection). On a statistical manifold, there are two: the e-connection (exponential) and the m-connection (mixture), which are dual with respect to the Fisher metric. Exponential families are flat under the e-connection, meaning their natural parameters form a coordinate system in which e-geodesics are straight lines. Mixture families are flat under the m-connection. KL divergence is the canonical divergence of this dually flat structure, and its asymmetry (D_KL(p||q) != D_KL(q||p)) reflects the dual nature of the two connections.

This geometric framework illuminates algorithms. The EM algorithm alternates between e-projection and m-projection, which is why it converges monotonically. Natural gradient descent (used in training neural networks) follows geodesics in the Fisher metric rather than Euclidean straight lines in parameter space, leading to faster convergence. Variational inference minimizes KL divergence, which is a projection in the information-geometric sense. The field, developed primarily by Shun-ichi Amari, continues to provide structural insights into machine learning, optimization, and the foundations of statistics.

Practice Questions 3 questions

Prerequisite Chain

Counting to 10Counting to 20Understanding ZeroThe Number ZeroCounting to FiveOne-to-One CorrespondenceCombining Small Groups Within 5Addition Within 10Addition Within 20Two-Digit Addition Without RegroupingTwo-Digit Addition with RegroupingAddition Within 100Repeated Addition as MultiplicationMultiplication Facts Within 100Division as Equal SharingDivision as Grouping (Measurement Division)Division: Grouping (Repeated Subtraction) ModelDivision: Fair Sharing ModelDivision as Equal SharingDivision as GroupingBasic Division FactsDivision Facts Within 100Two-Digit by One-Digit DivisionDivision with RemaindersRemainders and Quotients in DivisionDivision Word ProblemsIntroduction to Long DivisionFactors and MultiplesPrime and Composite NumbersEquivalent FractionsRelating Fractions and DecimalsDecimal Place ValueReading and Writing DecimalsComparing and Ordering DecimalsAdding and Subtracting DecimalsMultiplying DecimalsDividing DecimalsDividing FractionsMixed Number ArithmeticOrder of OperationsInteger Order of OperationsVariable ExpressionsCombining Like TermsOne-Step EquationsTwo-Step EquationsSolving Multi-Step EquationsEquations with Variables on Both SidesAngle Pairs: Complementary, Supplementary, and VerticalParallel Lines and TransversalsCorresponding AnglesAlternate Interior AnglesTriangle Angle Sum TheoremExterior Angle TheoremTriangle Inequality TheoremSimilar Triangles: AA SimilaritySimilar Triangles: SSS and SAS SimilarityProportions in Similar TrianglesRight Triangle Trigonometry IntroductionTrigonometric Ratios ReviewRadian MeasureConverting Between Degrees and RadiansThe Unit CircleGraphing Sine and CosineGraphing Tangent and Reciprocal Trigonometric FunctionsDerivatives of Trigonometric FunctionsAntiderivativesIndefinite IntegralsBasic Integration RulesRiemann SumsDefinite Integral DefinitionProbability Density Functions and Continuous DistributionsCumulative Distribution FunctionsContinuous Random VariablesProbability Density FunctionsShannon EntropyJoint and Conditional EntropyMutual InformationKL DivergenceInformation Geometry Basics

Longest path: 79 steps · 344 total prerequisite topics

Prerequisites (3)

Leads To (2)