Questions: Activation Functions in Neural Networks

5 questions to test your understanding

Score: 0 / 5
Question 1 Multiple Choice

A researcher builds a 10-layer neural network using only linear transformations between layers — no activation functions. What is the effective expressive power of this network compared to a single-layer linear model?

AIt is 10 times more powerful because each layer adds an independent linear transformation
BIt is equivalent to a single-layer linear transformation
CIt can approximate nonlinear functions through the interaction of many linear layers
DIt has exponentially more capacity because of its large number of parameters
Question 2 Multiple Choice

A multi-class classifier with 5 output classes uses ReLU as its output layer activation. What is the primary problem with this design?

AReLU blocks gradient flow backward through the output layer
BReLU outputs can be any non-negative number and will not sum to 1, making them invalid as class probabilities
CReLU is too computationally expensive for output layers
DReLU causes the dying ReLU problem, zeroing out all output neurons
Question 3 True / False

Stacking more linear layers without activation functions allows a neural network to model increasingly complex, nonlinear decision boundaries.

TTrue
FFalse
Question 4 True / False

ReLU avoids the vanishing gradient problem for positive inputs because its derivative is exactly 1, allowing gradient signals to flow backward through layers without shrinking.

TTrue
FFalse
Question 5 Short Answer

Why is a nonlinear activation function necessary between layers of a neural network? What happens if all activations are removed?

Think about your answer, then reveal below.