5 questions to test your understanding
An agent trained with model-based RL achieves near-perfect performance in simulation but fails catastrophically when deployed in the real environment. What is the most likely cause?
Why is Dyna-Q more sample-efficient than pure Q-learning on the same task?
Model-based RL is generally preferable to model-free RL because it learns from fewer real environment interactions.
In model-based RL, the agent can improve its policy by planning over simulated trajectories generated from a learned world model, without additional real-world interactions during the planning phase.
What is the central tension in model-based reinforcement learning, and how do modern approaches try to manage it?