Depth perception relies on multiple cues: binocular cues like stereopsis and convergence use information from both eyes, while monocular cues like linear perspective, occlusion, and motion parallax work with one eye. The visual system integrates these cues to construct three-dimensional representations of space.
Your study of visual system anatomy gave you the foundation: light entering two eyes is transduced on two retinas, processed through V1, and eventually reconstructed into a coherent perceptual scene. But the retina is a flat, two-dimensional surface — there is no depth dimension in the image itself. The brain must *infer* distance using indirect signals. This is the depth perception problem, and the visual system solves it not with a single definitive sensor but with a collection of partially redundant cues that the brain combines into a best estimate of spatial layout.
Binocular cues arise from having two eyes separated by about 6.5 centimeters. The most powerful is stereopsis, based on retinal disparity: because each eye views the world from a slightly different angle, nearby objects project to very different locations on the two retinas (large disparity), while distant objects project to nearly identical locations (small disparity). Neurons in V1 and V2 are tuned to specific disparity values — they fire when input from the two eyes matches a particular depth plane. The brain reads the disparity map and constructs a three-dimensional representation of relative depth with extraordinary precision; humans can detect depth differences of less than a millimeter at arm's length. A second binocular cue is convergence: the eyes rotate inward to fixate nearby objects, and proprioceptive feedback from the eye muscles provides a signal about fixation distance. Convergence is useful mainly for near distances (within arm's reach).
Monocular cues are available even with one eye closed, which means they are also available to artists rendering depth on flat surfaces. Occlusion (one object blocking another) indicates that the occluding object is closer — this is the most unambiguous cue. Linear perspective exploits the fact that parallel lines appear to converge in the distance; roads, train tracks, and hallways all provide this cue. Relative size uses the principle that familiar objects appearing smaller are farther away. Texture gradient works similarly: a field of grass shows fine texture in the distance and coarse texture nearby, providing continuous depth information across a scene. Motion parallax — the fastest of the monocular cues — occurs when you move your head: nearby objects sweep rapidly across your visual field while distant objects appear nearly stationary. This is the monocular cue that is most informative for dynamic scenes.
The visual system integrates these cues probabilistically: cues that are reliable in the current context are weighted more heavily. Under normal conditions outdoors, all cues are consistent and integration is seamless — you simply perceive depth. Depth illusions occur when cues conflict: in a Ponzo illusion, linear perspective cues cause the visual system to scale up the size of objects placed in the "distant" region of the image, making identical objects appear different in size. Computer screens, paintings, and photographs fool us because they provide rich monocular cues (perspective, shading, occlusion) while binocular cues reveal a flat surface. The brain partially resolves this conflict, producing the experience of depth in a flat image that we nonetheless know is flat.
No topics depend on this one yet.