Questions: Reinforcement Learning for Robot Control

1 questions to test your understanding

Score: 0 / 1
Question 1 Multiple Choice

A robot learns to grasp objects using deep Q-learning. The learned Q-network estimates Q(s,a) = expected total discounted future reward for taking action a in state s. The robot grasps a fragile object and applies too much force, breaking it. How should the reward function be modified to prevent this failure in the future?

AGive large negative reward when an object breaks, so the Q-network learns to avoid broken states
BGive negative reward proportional to grasping force to penalize excessive force even before breakage occurs
CReduce the discount factor γ so the network focuses only on immediate rewards, ignoring long-term consequences
DIncrease the learning rate so the network updates faster and learns from fewer examples