Questions — Introduction to Reinforcement Learning

Question 1 Multiple Choice

A robot learning to navigate a maze always chooses the action with the highest known reward (purely greedy strategy). It finds a path yielding +5 reward and consistently follows it. The true optimal path yields +20 but was never explored. This scenario best illustrates:

AA successful application of reinforcement learning — the robot found a working policy.

BThe exploration-exploitation tradeoff: excessive exploitation causes the agent to get stuck in a locally optimal but globally suboptimal policy.

CA failure of the discount factor — the agent valued immediate rewards too highly.

DA model-based failure — the agent needs to learn the transition model first.

Question 2 Multiple Choice

How does reinforcement learning differ most fundamentally from supervised learning?

ARL requires neural networks, while supervised learning can use simpler models.

BIn RL, the agent learns from interaction — receiving reward signals without labeled 'correct answer' examples — while supervised learning trains on labeled input-output pairs provided by a human teacher.

CRL only applies to sequential decision tasks in games, while supervised learning handles real-world problems.

DRL always requires more data than supervised learning to achieve good performance.

Question 3 True / False

In reinforcement learning, a discount factor γ close to 1 causes the agent to value distant future rewards nearly as much as immediate ones, making it more far-sighted in its decision-making.

TTrue

FFalse

Question 4 True / False

Model-free reinforcement learning methods are generally superior to model-based methods because they avoid making assumptions about the environment's transition dynamics.

TTrue

FFalse

Question 5 Short Answer

Why is the exploration-exploitation tradeoff a fundamental challenge in reinforcement learning, and what makes it difficult to resolve optimally?

Think about your answer, then reveal below.

Questions: Introduction to Reinforcement Learning