The brain has specialized regions for face processing in the ventral visual stream. The fusiform face area (FFA) shows selective responses to faces over objects, while the superior temporal sulcus codes changeable aspects (eye gaze, expression). Representations are highly configural—inversion or spacing disruption impairs recognition—suggesting faces are processed as integrated wholes. Early expertise effects suggest this specialization develops through experience rather than being innate, and expertise for other object categories activates similar regions.
You have already learned that the ventral visual stream — running from primary visual cortex through temporal lobe regions — performs increasingly abstract object recognition, with neurons progressively selective for complex categories. Faces are the most behaviorally important visual objects humans encounter, and the brain treats them accordingly: not merely as another object category, but as stimuli processed through a partially specialized network. Understanding this network means understanding both *where* and *how* face processing works.
The fusiform face area (FFA), located in the fusiform gyrus of the lateral temporal lobe, shows dramatically greater activation to faces than to other object categories in fMRI studies, and lesions to this region cause prosopagnosia — the striking inability to recognize individual faces while object recognition remains largely intact. A parallel region, the occipital face area (OFA), processes the parts-level structure of faces and feeds into the FFA. Higher up in the hierarchy, the superior temporal sulcus (STS) responds selectively to the *changeable* aspects of faces — eye gaze direction, emotional expression, mouth movements during speech — rather than stable identity. This functional division makes ecological sense: you need one system to recognize who someone is (FFA, invariant identity), and another to read their current intentions and state (STS, dynamic signals).
The most theoretically important property of face perception is configural processing — faces are represented as integrated wholes, not as a collection of independent features. The classic demonstration is the inversion effect: you can recognize an inverted object almost as well as an upright one, but face recognition degrades dramatically when faces are turned upside-down. Even more striking is the Thatcher effect — when the eyes and mouth within an inverted face are locally rotated to be upright, the result looks monstrous when viewed upright but goes almost undetected when the whole face is inverted. This shows that you are normally sensitive to the spatial relationships among features (the configural information), not just the features themselves, and that this sensitivity depends on the canonical upright orientation.
Whether the FFA is specifically a face module or, more broadly, an area for fine-grained discrimination of any well-learned object category remains debated. Expert bird-watchers and car experts show elevated FFA activation for their specialty categories, suggesting that the "face area" is better understood as an area for expert-level individuation of visually homogeneous categories. Faces simply happen to be the category that every normally developing human practices to the point of expertise from birth. This interpretation bridges the specialization evidence with a general account of perceptual learning: the brain allocates and tunes representational resources to the categories that matter most for the individual organism.