Questions: Activation Functions in Neural Networks

5 questions to test your understanding

Score: 0 / 5

Question 1 Multiple Choice

A researcher builds a 10-layer neural network using only linear transformations between layers — no activation functions. What is the effective expressive power of this network compared to a single-layer linear model?

AIt is 10 times more powerful because each layer adds an independent linear transformation

BIt is equivalent to a single-layer linear transformation

CIt can approximate nonlinear functions through the interaction of many linear layers

DIt has exponentially more capacity because of its large number of parameters

Question 2 Multiple Choice

A multi-class classifier with 5 output classes uses ReLU as its output layer activation. What is the primary problem with this design?

AReLU blocks gradient flow backward through the output layer

BReLU outputs can be any non-negative number and will not sum to 1, making them invalid as class probabilities

CReLU is too computationally expensive for output layers

DReLU causes the dying ReLU problem, zeroing out all output neurons

Question 3 True / False

Stacking more linear layers without activation functions allows a neural network to model increasingly complex, nonlinear decision boundaries.

TTrue

FFalse

Question 4 True / False

ReLU avoids the vanishing gradient problem for positive inputs because its derivative is exactly 1, allowing gradient signals to flow backward through layers without shrinking.

TTrue

FFalse

Question 5 Short Answer

Why is a nonlinear activation function necessary between layers of a neural network? What happens if all activations are removed?

Think about your answer, then reveal below.