fMRI measures blood oxygen level-dependent (BOLD) signals as an indirect proxy for neural activity through neurovascular coupling. While offering excellent spatial resolution (~2-3mm), fMRI has temporal resolution on the order of seconds, limiting inference about precise neural dynamics and causal mechanisms. Interpreting fMRI requires understanding its hemodynamic basis, temporal filtering, and the gap between statistical activation and functional necessity.
Begin with BOLD physics and the neurovascular coupling mechanisms that link neural activity to blood flow changes. Study actual fMRI datasets examining different cognitive processes (motor, visual, language) to develop intuition for signal characteristics, noise patterns, and preprocessing artifacts.
You know that the BOLD signal measures blood oxygen level-dependent contrast: when neurons fire, local blood flow increases and delivers more oxygenated hemoglobin than is immediately consumed, creating a detectable change in the MRI signal because deoxyhemoglobin is paramagnetic and oxyhemoglobin is not. The critical insight for interpretation is that this hemodynamic response is an indirect and delayed proxy for neural activity. The hemodynamic response function (HRF) peaks roughly 5–6 seconds after the neural event and returns to baseline after ~20 seconds. This means fMRI cannot resolve the millisecond-to-millisecond firing dynamics you might care about — it is a sluggish window onto neural processes. Temporal resolution on the order of seconds is adequate for sustained cognitive states (sustained attention, working memory maintenance) but inadequate for fast neural computations.
The statistical analysis of fMRI data relies on the General Linear Model (GLM). The expected BOLD response to each experimental condition is modeled by convolving the experimental design with the HRF (producing predicted time courses), and the GLM estimates how well each voxel's actual signal matches these predicted time courses. The t-statistic for each voxel tests whether a given condition produced above-baseline activation. Because you are testing thousands or hundreds of thousands of voxels simultaneously, the multiple comparisons problem is severe — by chance, many voxels will appear significant. Cluster-level correction (requiring that activated regions be spatially extended, not single isolated voxels) and family-wise error correction address this, but the choice of threshold is a genuine methodological debate in the field.
Understanding what fMRI can and cannot tell you is as important as understanding what it shows. The reverse inference problem is a fundamental interpretive trap: if a study shows that the dorsolateral prefrontal cortex (dlPFC) activates during a working memory task, it is tempting to conclude that dlPFC activation means working memory is engaged. But dlPFC is recruited by many cognitive processes — attention, inhibitory control, task switching. Seeing dlPFC activation tells you that something demanding is happening; it does not unambiguously identify which cognitive process. Reverse inference becomes more valid when regions are known to be highly selective, and weaker for multi-function regions.
The deepest limitation of fMRI is its inability to establish causal necessity. Correlation between BOLD signal and behavior tells you a region is active when a task is performed; it does not tell you whether that region is required for the task. A region could be active as a bystander to the main computation, as part of monitoring or error-checking circuits, or as an epiphenomenon. This is where converging methods become essential: TMS (which you will encounter next) can disrupt a specific region and test whether behavior degrades, establishing causal necessity. The gold standard for causal claims combines fMRI localization with TMS disruption — fMRI tells you where to target, TMS tests whether that target is doing necessary computational work.