Questions: Gradient Descent and Optimization

3 questions to test your understanding

Score: 0 / 3
Question 1 Multiple Choice

When training a neural network with gradient descent, the loss stops decreasing and oscillates around a high value. What is the most likely cause?

AThe learning rate is too small
BThe learning rate is too large
CThe model has too few parameters
DThe loss function is non-differentiable
Question 2 True / False

Gradient descent on a non-convex loss function is very likely to find the global minimum if you run it for enough iterations.

TTrue
FFalse
Question 3 Short Answer

Vanilla gradient descent computes the gradient over the entire dataset before each update. What problem does stochastic gradient descent (SGD) address, and what tradeoff does it introduce?

Think about your answer, then reveal below.