Questions — Model-Based Reinforcement Learning

Question 1 Multiple Choice

An agent trained with model-based RL achieves near-perfect performance in simulation but fails catastrophically when deployed in the real environment. What is the most likely cause?

AThe agent's policy network was too small to generalize beyond the training distribution

BModel-free updates are incompatible with simulated experience

CThe learned world model contains errors that the agent exploited, producing a policy tuned to the model's mistakes rather than the real environment

DThe agent did not collect enough real interactions before beginning to plan

Question 2 Multiple Choice

Why is Dyna-Q more sample-efficient than pure Q-learning on the same task?

ADyna-Q uses a larger neural network that generalizes better across states

BDyna-Q skips the Q-learning update after real environment steps to save computation

CEach real interaction updates a world model, which then generates n additional simulated transitions used for Q-learning updates without further real-world steps

DDyna-Q applies model-free updates exclusively on high-reward trajectories, filtering out uninformative experiences

Question 3 True / False

Model-based RL is generally preferable to model-free RL because it learns from fewer real environment interactions.

TTrue

FFalse

Question 4 True / False

In model-based RL, the agent can improve its policy by planning over simulated trajectories generated from a learned world model, without additional real-world interactions during the planning phase.

TTrue

FFalse

Question 5 Short Answer

What is the central tension in model-based reinforcement learning, and how do modern approaches try to manage it?

Think about your answer, then reveal below.

Questions: Model-Based Reinforcement Learning