Information theory provides quantitative tools for understanding neural coding: how neurons encode sensory information, process it, and transmit it to downstream structures. The mutual information I(S; R) between stimulus S and neural response R quantifies how much information about the stimulus is available in spike patterns. The information rate (bits per spike or bits per second) measures the efficiency of the neural code. Fisher information quantifies the precision with which neurons can encode stimulus parameters — related to but distinct from mutual information. The channel capacity of a single neuron (the maximum information that can be reliably transmitted given its biophysical constraints) explains why neurons use high rates: limited bandwidth requires high firing rates or complex temporal patterns. Population coding amplifies information through redundancy and synergy. Information-theoretic frameworks reveal that neural systems operate near information-theoretic limits, often optimizing for coding efficiency under metabolic constraints. These concepts illuminate sensory transduction, neural computation, learning, and brain function.
How does the brain encode information? A neuron receives inputs, fires action potentials, and transmits signals to downstream targets. How much information about sensory stimuli is encoded in spike patterns? How efficiently do neurons use their bandwidth? Information theory provides quantitative answers.
Neural Information and Mutual Information:
Consider a sensory neuron responding to a stimulus (e.g., light intensity). The stimulus S ranges over possible values; the response R is the spike count or spike timing. The mutual information I(S; R) = H(R) - H(R|S) measures how much knowing the response reduces uncertainty about the stimulus. H(R) is the response entropy (uncertainty in spike patterns given no stimulus information). H(R|S) is the response entropy conditioned on the stimulus (residual uncertainty due to noise). If responses are always the same regardless of stimulus, I(S; R) = 0. If responses perfectly track the stimulus, I(S; R) = H(R). Empirically, sensory neurons carry 1-10 bits of information per stimulus presentation, surprisingly high given the apparent noisiness of individual spikes.
Fisher Information and Decoding Precision:
Fisher information F(theta) measures the curvature of the log-likelihood of a response given parameter theta. The Cramer-Rao bound states that the minimum-variance unbiased estimator of theta achieves variance lower-bounded by 1/F. For neurons encoding a stimulus intensity, high Fisher information means small intensity changes are reliably detected. The relationship between Fisher and mutual information is subtle: mutual information is the average information over the entire stimulus range; Fisher information is the local information around a particular value. For Gaussian noise, the relationship is clean, but in general they capture complementary aspects.
Information Rate and Bandwidth:
Neurons operate under bandwidth constraints. The refractory period (1-2 ms) limits the temporal resolution of spike timing. The maximum spike rate (limited by biophysics) constrains how fast the neuron can signal. Together, these create a finite "channel capacity": the maximum information the neuron can reliably transmit per unit time. For a neuron with maximum firing rate f_max (Hz) and temporal resolution delta_t (seconds), the information-theoretic capacity is roughly log_2(f_max * delta_t) bits per spike. To transmit more information, the neuron must increase its firing rate or use more complex temporal patterns (burst timing, phase relationships).
Population Coding and Synergy:
No single neuron carries all information about a stimulus. Populations of neurons distribute information across many cells. If N neurons each independently carried I bits and were uncorrelated, the population would carry N*I bits. In reality, neurons are correlated — they share information (redundancy) — but also encode in collective patterns (synergy). The challenge is decoding: how does the brain extract information from population responses? Linear decoding (weighted sum of spike counts) leaves information on the table; nonlinear decoding can extract synergistic information. Populations are often organized to minimize redundancy (e.g., neurons with different tuning curves) while maximizing synergy for task-relevant variables.
Efficient Coding Hypothesis:
A central principle in computational neuroscience is that neural circuits optimize the information transmitted per unit metabolic cost. Neurons are expensive: a single action potential costs roughly 10^9 ATP molecules. The firing rate reflects an energy-information tradeoff. Sensory systems in data-rich environments (e.g., vision) fire at higher rates than those in low-information environments (e.g., slow chemical sensing). Learning itself may optimize neural codes: early in training, neurons fire irregularly; with practice, responses become more selective (reduced entropy) and informative about task-relevant variables. This fits an information-theoretic view: the nervous system allocates resources (firing rates, connectivity) to maximize information about behaviorally important variables.
Applications:
Information theory applied to neuroscience reveals that the brain, despite its apparent randomness and noise, operates near information-theoretic limits — efficiently encoding, compressing, and transmitting information under severe biological constraints. This perspective has transformed our understanding of neural coding and continues to guide research into how the brain solves information processing problems.
No topics depend on this one yet.