Questions: Capsule Networks

5 questions to test your understanding

Score: 0 / 5
Question 1 Multiple Choice

A CNN trained on faces correctly classifies a normal face image. When shown an image with two eyes positioned below the mouth (spatially scrambled), the CNN still outputs high confidence for 'face.' What architectural feature of CNNs explains this failure?

ACNNs use too few layers to detect complex features like faces
BMax pooling discards spatial position information, so the CNN detects that face parts are present but cannot verify their geometric arrangement
CCNNs use ReLU activations which cannot represent negative spatial relationships
DSoftmax output layers normalize confidence scores in a way that ignores spatial order
Question 2 Multiple Choice

In a capsule network, a 'mouth capsule' outputs a vector with length 0.95 and a specific orientation. What does each component represent?

AThe length represents the capsule's learning rate; the orientation represents the error gradient direction
BThe length represents the probability that a mouth is present; the orientation encodes instantiation parameters like position, size, and tilt
CThe length represents the number of training examples containing mouths; orientation encodes the class label
DBoth length and orientation together represent the capsule's confidence score, similar to a scalar neuron
Question 3 True / False

In routing by agreement, connections between lower-level capsules (parts) and higher-level capsules (wholes) are strengthened when the part capsules' predictions about the parent capsule are geometrically consistent with each other.

TTrue
FFalse
Question 4 True / False

Capsule networks are more computationally efficient than CNNs because routing by agreement eliminates the need for multiple convolutional layers.

TTrue
FFalse
Question 5 Short Answer

Why does a capsule's vector output achieve viewpoint equivariance more structurally than a CNN's pooling-based approach?

Think about your answer, then reveal below.