Questions — Reinforcement Learning for Robot Control

Question 1 Multiple Choice

A robot learns to grasp objects using deep Q-learning. The learned Q-network estimates Q(s,a) = expected total discounted future reward for taking action a in state s. The robot grasps a fragile object and applies too much force, breaking it. How should the reward function be modified to prevent this failure in the future?

AGive large negative reward when an object breaks, so the Q-network learns to avoid broken states

BGive negative reward proportional to grasping force to penalize excessive force even before breakage occurs

CReduce the discount factor γ so the network focuses only on immediate rewards, ignoring long-term consequences

DIncrease the learning rate so the network updates faster and learns from fewer examples

Questions: Reinforcement Learning for Robot Control