Questions: Gated Recurrent Units (GRU)

5 questions to test your understanding

Score: 0 / 5

Question 1 Multiple Choice

What does the GRU's update gate accomplish that requires two separate gates in an LSTM?

AIt controls whether the hidden state is transmitted to the output layer

BIt simultaneously handles what to forget from the old state and what new information to incorporate — merging the LSTM's forget gate and input gate into one

CIt applies a nonlinear transformation to the input so the network can learn complex patterns

DIt selects which elements of the hidden state to reset to zero between sequences

Question 2 Multiple Choice

A team is training a model on sequences of moderate length with a limited dataset and tight computational budget. Which consideration most favors using a GRU over an LSTM?

AGRUs are guaranteed to outperform LSTMs on all natural language tasks

BGRUs are always faster to train than LSTMs regardless of sequence length or hardware

CGRUs have fewer parameters than an equivalently sized LSTM, reducing overfitting risk on small datasets and lowering training cost

DGRUs handle the vanishing gradient problem more effectively than LSTMs because they have fewer gates

Question 3 True / False

Like LSTMs, GRUs maintain two separate memory vectors: a cell state for long-term memory and a hidden state for short-term context.

TTrue

FFalse

Question 4 True / False

The reset gate in a GRU controls how much of the previous hidden state is used when computing the candidate new hidden state.

TTrue

FFalse

Question 5 Short Answer

How does the GRU's update gate prevent the vanishing gradient problem during backpropagation through time?

Think about your answer, then reveal below.