Questions: Sequence-to-Sequence Models

5 questions to test your understanding

Score: 0 / 5

Question 1 Multiple Choice

A seq2seq model translates short sentences well but quality degrades sharply on paragraphs. What architectural feature most likely causes this?

AThe decoder LSTM cannot process more than one output token at a time

BThe encoder compresses the entire input into a fixed-size vector, losing information for long inputs

CBeam search becomes computationally intractable for long sequences

DLSTMs cannot maintain hidden state for more than 50 steps

Question 2 Multiple Choice

During decoding, beam search with width k=5 is used instead of greedy decoding. Which best describes what beam search guarantees?

AIt finds the globally optimal output sequence with probability 1

BIt finds an output sequence at least as good as greedy decoding, but the global optimum is not guaranteed

CIt samples k diverse outputs randomly, improving expected quality

DIt guarantees the highest-probability individual token at every step

Question 3 True / False

In a seq2seq model without attention, the decoder can primarily use information about the first few input tokens because LSTM hidden states decay over time.

TTrue

FFalse

Question 4 True / False

With attention, the decoder can place different amounts of focus on different input positions at each generation step, rather than being restricted to a single fixed context vector.

TTrue

FFalse

Question 5 Short Answer

Why does the information bottleneck in a standard encoder-decoder model become a problem for long sequences, and how does attention address it?

Think about your answer, then reveal below.