Questions: Introduction to Reinforcement Learning

5 questions to test your understanding

Score: 0 / 5
Question 1 Multiple Choice

A robot learning to navigate a maze always chooses the action with the highest known reward (purely greedy strategy). It finds a path yielding +5 reward and consistently follows it. The true optimal path yields +20 but was never explored. This scenario best illustrates:

AA successful application of reinforcement learning — the robot found a working policy.
BThe exploration-exploitation tradeoff: excessive exploitation causes the agent to get stuck in a locally optimal but globally suboptimal policy.
CA failure of the discount factor — the agent valued immediate rewards too highly.
DA model-based failure — the agent needs to learn the transition model first.
Question 2 Multiple Choice

How does reinforcement learning differ most fundamentally from supervised learning?

ARL requires neural networks, while supervised learning can use simpler models.
BIn RL, the agent learns from interaction — receiving reward signals without labeled 'correct answer' examples — while supervised learning trains on labeled input-output pairs provided by a human teacher.
CRL only applies to sequential decision tasks in games, while supervised learning handles real-world problems.
DRL always requires more data than supervised learning to achieve good performance.
Question 3 True / False

In reinforcement learning, a discount factor γ close to 1 causes the agent to value distant future rewards nearly as much as immediate ones, making it more far-sighted in its decision-making.

TTrue
FFalse
Question 4 True / False

Model-free reinforcement learning methods are generally superior to model-based methods because they avoid making assumptions about the environment's transition dynamics.

TTrue
FFalse
Question 5 Short Answer

Why is the exploration-exploitation tradeoff a fundamental challenge in reinforcement learning, and what makes it difficult to resolve optimally?

Think about your answer, then reveal below.