Questions: Gradient Descent and Optimization

3 questions to test your understanding

Score: 0 / 3

Question 1 Multiple Choice

When training a neural network with gradient descent, the loss stops decreasing and oscillates around a high value. What is the most likely cause?

AThe learning rate is too small

BThe learning rate is too large

CThe model has too few parameters

DThe loss function is non-differentiable

Question 2 True / False

Gradient descent on a non-convex loss function is very likely to find the global minimum if you run it for enough iterations.

TTrue

FFalse

Question 3 Short Answer

Vanilla gradient descent computes the gradient over the entire dataset before each update. What problem does stochastic gradient descent (SGD) address, and what tradeoff does it introduce?

Think about your answer, then reveal below.