Questions: Visual Object Recognition and Categorization
5 questions to test your understanding
Score: 0 / 5
Question 1 Multiple Choice
A person easily identifies a chair when viewed from the front but struggles when it's rotated to an unusual angle. Which theoretical account of object recognition best predicts this difficulty, and which better explains viewpoint-invariant recognition?
ATemplate theory predicts the difficulty; structural description theory better explains invariance, because geons in spatial relationships specify objects in largely viewpoint-independent terms
BStructural description theory predicts the difficulty; template theory better explains invariance, because stored templates cover all possible viewpoints
CBoth theories predict equal difficulty; viewpoint effects are explained by attentional limits rather than representational format
DTemplate theory predicts the difficulty; and it also better explains invariance by proposing a separate template for each viewpoint
Template theories store mental images and match new input against stored copies — unusual viewpoints produce poor matches, predicting recognition difficulty. But template theories require an exponentially large library (one per object per viewpoint), which is computationally implausible. Structural description theories like Biederman's Recognition-by-Components propose that objects are decomposed into geons (cylinders, cones, blocks) in spatial relationships. Because geons are largely viewpoint-invariant — a cylinder looks like a cylinder from most angles — this approach naturally explains recognition across viewpoints without an infinite template library. Option D describes the template theory's attempted fix, which is precisely what makes it implausible.
Question 2 Multiple Choice
A car enthusiast can identify the make, model, and year of a vehicle at a glance, while a non-enthusiast sees only 'a car.' This difference is best explained by:
AThe enthusiast has better visual acuity, allowing finer-grained perceptual discrimination
BThe enthusiast categorizes at the basic level, while the non-enthusiast is stuck at the superordinate level
CYears of expertise have built fine-grained categorical representations in the enthusiast's ventral stream, shifting fast recognition toward the subordinate level
DThe non-enthusiast fails to apply Gestalt grouping principles correctly when viewing vehicles
For novices, recognition is fastest at the basic level — the level where category members share a characteristic shape ('car' rather than 'vehicle' or 'Honda Civic'). Expertise expands the resolution of categorical representations in ventral stream cortex: with enough exposure, subordinate distinctions become as automatic and rapid as basic-level recognition. This is the same mechanism by which radiologists recognize subtle tumors that non-experts see as noise. Option B has the levels inverted — a non-enthusiast categorizes at the basic level ('car'), not at the superordinate level.
Question 3 True / False
Template theories of object recognition can fully account for viewpoint-invariant recognition by storing a small number of canonical views per object.
TTrue
FFalse
Answer: False
Viewpoint invariance is precisely what template theories struggle to explain. A canonical-view template fails for unusual viewpoints. Storing more templates (one per viewpoint) leads to a combinatorial explosion: for N objects across M viewpoints and K sizes, you need N×M×K templates. Structural description theories avoid this by representing objects in terms of viewpoint-independent geometric primitives (geons), which is why they provide a more plausible account of invariant recognition. The real brain likely uses both strategies in different conditions, but template matching alone cannot explain the breadth of human viewpoint tolerance.
Question 4 True / False
Object recognition in humans is typically faster at the basic level than at the subordinate level for novices.
TTrue
FFalse
Answer: True
This is one of the most replicated findings in categorization research. The basic level ('dog,' 'car,' 'chair') corresponds to the level where category members share a characteristic overall shape — which the ventral stream is most naturally sensitive to. Superordinate categories ('animal,' 'vehicle') are too variable in shape; subordinate categories ('golden retriever,' 'Honda Civic') require finer distinctions that are only automatic for experts. Basic-level advantage is explained by the match between perceptual feature distributions and categorical boundary placement at this level.
Question 5 Short Answer
Why is object recognition described as 'active and hypothesis-driven' rather than passive template-matching, and what evidence supports this characterization?
Think about your answer, then reveal below.
Model answer: The visual system builds a hypothesis about what an object is and tests it against incoming evidence, rather than passively comparing sensory input to stored images. Evidence includes: ambiguous figures (Rubin's vase, the duck-rabbit) that flip between interpretations depending on top-down expectations; the role of context — the same shape is read as a letter or number depending on surrounding characters; and camouflage effects, where finding a hidden object becomes dramatically easier once you know what to look for. Prior knowledge and task demands shape what the system 'sees,' which a purely bottom-up matching account cannot explain.
Top-down influences on recognition are pervasive: feedback connections from higher to lower ventral stream areas allow current categorical hypotheses to influence how early visual features are processed. This is why recognition is not simply a function of image quality — degraded images of familiar objects are often recognized when the observer is told the category, because the hypothesis guides attention to diagnostic features. A passive template-matching system has no mechanism for this kind of top-down guidance.