Questions: Visual Cortex Hierarchical Organization and Feature Extraction
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A neuron in inferior temporal (IT) cortex responds strongly to faces regardless of whether the face is large or small, centered or peripheral, or brightly or dimly lit. A V1 neuron responding to a 45° edge only in a specific retinal location does NOT share this property. What distinguishes the IT neuron's response?
AThe IT neuron uses lateral inhibition to suppress responses to non-face stimuli, creating a selective response
BThe IT neuron has a large receptive field and invariant tuning — its response is robust to changes in position, size, and lighting that would disrupt V1
CThe IT neuron receives direct input from the retina, bypassing V1 and the intermediate hierarchy
DThe IT neuron responds to faces because faces activate the retinotopic map at a specific location reserved for socially relevant stimuli
As you ascend the visual hierarchy, receptive fields grow larger and representations become more invariant. An IT neuron integrates information across a large portion of the visual field and across many lower-level detectors, making its response robust to the transformations (position shift, size change, illumination change) that would completely silence a V1 neuron tuned to a specific edge in a specific location. This invariant object recognition is the computationally remarkable achievement of the hierarchical architecture.
Question 2 Multiple Choice
A V1 neuron fails to respond to a photograph of a human face even though the face contains many oriented edges. The most likely explanation is:
AV1 neurons require color information, and the photograph was black-and-white
BV1 receptive fields are small and tuned to simple features like single oriented edges — the face as a whole is not a V1-level feature
CV1 is only active during the first 50 ms after stimulus onset, before the brain has time to process complex objects
DFace recognition suppresses V1 activity through top-down feedback to conserve metabolic resources
V1 neurons respond to elementary local features — an oriented edge at a specific retinal location, at a specific spatial frequency. A face is a high-level, spatially extended object that requires integrating information across many V1 outputs through multiple hierarchical stages. A single V1 neuron 'sees' only a tiny patch of the image; it has no access to the relational structure (eyes above nose above mouth) that defines a face. Object recognition requires V2, V4, and IT cortex built on V1's outputs.
Question 3 True / False
As visual processing ascends from V1 to higher cortical areas, neurons develop progressively larger receptive fields, more complex feature tuning, and greater invariance to position, size, and illumination.
TTrue
FFalse
Answer: True
This systematic progression across the hierarchy is well-established. V1 neurons have small receptive fields and detect simple oriented edges. V2 and V4 neurons have larger receptive fields and respond to contours, textures, and object parts. Inferior temporal (IT) neurons have very large receptive fields and respond to complete objects and faces regardless of position, size, or lighting. Each stage inherits and transforms the outputs of the stage below it.
Question 4 True / False
Primary visual cortex (V1) is capable of recognizing objects and faces but uses a more primitive computational strategy than inferotemporal cortex.
TTrue
FFalse
Answer: False
V1 neurons have no capacity for object recognition whatsoever — they respond only to oriented edges, spatial frequencies, and luminance contrasts within a small patch of the visual field. They carry no information about objects, faces, or meaning. Object recognition is an emergent property of multiple hierarchical stages of transformation; it cannot be performed at V1 regardless of 'strategy.' This is the key architectural insight: each stage of the hierarchy is tuned to the level of complexity appropriate for its position.
Question 5 Short Answer
Why doesn't the brain need a separate neural detector for every possible object at every possible position, size, and lighting condition? What does the hierarchical architecture provide instead?
Think about your answer, then reveal below.
Model answer: The combinatorial explosion of such an approach would be impossible: even a modest object set at many positions, scales, and lighting conditions would require more detectors than neurons in the brain. The hierarchical architecture avoids this by building complex representations through composition of simpler ones. V1 detects oriented edges; V2 combines edges into contours; V4 assembles contours into object parts; IT cortex integrates parts into complete objects. Critically, invariance builds gradually across stages: each successive stage becomes more tolerant of the transformations (position, size, illumination) that would disrupt earlier representations. This means a small set of learned primitive features can generalize to an unlimited variety of novel objects, enabling recognition without explicit templates for every possible instance.