Questions — Capsule Networks — Open Knowledge Graph

Question 1 Multiple Choice

A CNN trained on faces correctly classifies a normal face image. When shown an image with two eyes positioned below the mouth (spatially scrambled), the CNN still outputs high confidence for 'face.' What architectural feature of CNNs explains this failure?

ACNNs use too few layers to detect complex features like faces

BMax pooling discards spatial position information, so the CNN detects that face parts are present but cannot verify their geometric arrangement

CCNNs use ReLU activations which cannot represent negative spatial relationships

DSoftmax output layers normalize confidence scores in a way that ignores spatial order

Question 2 Multiple Choice

In a capsule network, a 'mouth capsule' outputs a vector with length 0.95 and a specific orientation. What does each component represent?

AThe length represents the capsule's learning rate; the orientation represents the error gradient direction

BThe length represents the probability that a mouth is present; the orientation encodes instantiation parameters like position, size, and tilt

CThe length represents the number of training examples containing mouths; orientation encodes the class label

DBoth length and orientation together represent the capsule's confidence score, similar to a scalar neuron

Question 3 True / False

In routing by agreement, connections between lower-level capsules (parts) and higher-level capsules (wholes) are strengthened when the part capsules' predictions about the parent capsule are geometrically consistent with each other.

TTrue

FFalse

Question 4 True / False

Capsule networks are more computationally efficient than CNNs because routing by agreement eliminates the need for multiple convolutional layers.

TTrue

FFalse

Question 5 Short Answer

Why does a capsule's vector output achieve viewpoint equivariance more structurally than a CNN's pooling-based approach?

Think about your answer, then reveal below.

Questions: Capsule Networks