Questions: Gated Recurrent Units (GRU)

5 questions to test your understanding

Score: 0 / 5
Question 1 Multiple Choice

What does the GRU's update gate accomplish that requires two separate gates in an LSTM?

AIt controls whether the hidden state is transmitted to the output layer
BIt simultaneously handles what to forget from the old state and what new information to incorporate — merging the LSTM's forget gate and input gate into one
CIt applies a nonlinear transformation to the input so the network can learn complex patterns
DIt selects which elements of the hidden state to reset to zero between sequences
Question 2 Multiple Choice

A team is training a model on sequences of moderate length with a limited dataset and tight computational budget. Which consideration most favors using a GRU over an LSTM?

AGRUs are guaranteed to outperform LSTMs on all natural language tasks
BGRUs are always faster to train than LSTMs regardless of sequence length or hardware
CGRUs have fewer parameters than an equivalently sized LSTM, reducing overfitting risk on small datasets and lowering training cost
DGRUs handle the vanishing gradient problem more effectively than LSTMs because they have fewer gates
Question 3 True / False

Like LSTMs, GRUs maintain two separate memory vectors: a cell state for long-term memory and a hidden state for short-term context.

TTrue
FFalse
Question 4 True / False

The reset gate in a GRU controls how much of the previous hidden state is used when computing the candidate new hidden state.

TTrue
FFalse
Question 5 Short Answer

How does the GRU's update gate prevent the vanishing gradient problem during backpropagation through time?

Think about your answer, then reveal below.