Questions: Model-Based Reinforcement Learning

5 questions to test your understanding

Score: 0 / 5
Question 1 Multiple Choice

An agent trained with model-based RL achieves near-perfect performance in simulation but fails catastrophically when deployed in the real environment. What is the most likely cause?

AThe agent's policy network was too small to generalize beyond the training distribution
BModel-free updates are incompatible with simulated experience
CThe learned world model contains errors that the agent exploited, producing a policy tuned to the model's mistakes rather than the real environment
DThe agent did not collect enough real interactions before beginning to plan
Question 2 Multiple Choice

Why is Dyna-Q more sample-efficient than pure Q-learning on the same task?

ADyna-Q uses a larger neural network that generalizes better across states
BDyna-Q skips the Q-learning update after real environment steps to save computation
CEach real interaction updates a world model, which then generates n additional simulated transitions used for Q-learning updates without further real-world steps
DDyna-Q applies model-free updates exclusively on high-reward trajectories, filtering out uninformative experiences
Question 3 True / False

Model-based RL is generally preferable to model-free RL because it learns from fewer real environment interactions.

TTrue
FFalse
Question 4 True / False

In model-based RL, the agent can improve its policy by planning over simulated trajectories generated from a learned world model, without additional real-world interactions during the planning phase.

TTrue
FFalse
Question 5 Short Answer

What is the central tension in model-based reinforcement learning, and how do modern approaches try to manage it?

Think about your answer, then reveal below.