Questions: Exploration vs. Exploitation Tradeoff

5 questions to test your understanding

Score: 0 / 5
Question 1 Multiple Choice

An ε-greedy agent (ε=0.1) has tried each of 10 slot machines exactly 1,000 times and has well-estimated average payouts for all of them. A critic says the agent is now exploring too wastefully. What is the most accurate diagnosis of ε-greedy's problem in this situation?

Aε-greedy continues exploring all machines equally regardless of how confident the agent is, wasting 10% of pulls on machines already known to be inferior
Bε-greedy explores too little at this stage — after 1,000 trials per machine the agent should increase ε to refine its estimates further
Cε-greedy is optimal here because 10% is the statistically correct exploration rate for 10 machines with 1,000 trials each
Dε-greedy should be replaced with pure exploitation (ε=0) only after the total number of trials exceeds the square root of the number of machines times 1,000
Question 2 Multiple Choice

A Thompson sampling agent for a 5-arm bandit problem has tried arm 3 only twice and has a very wide posterior distribution for its reward probability. What mechanically causes the agent to explore arm 3 frequently despite no explicit exploration rule?

AThe wide posterior produces high-variance samples, so arm 3 frequently generates the highest sampled value among all arms and gets selected
BThompson sampling adds an explicit exploration bonus to arms with few trials, similar to UCB's confidence interval
CThompson sampling always selects the arm with the lowest observed average reward to gather maximally diverse data
DThe posterior distribution for arm 3 has a higher mean than arms tried more often, making it preferentially selected
Question 3 True / False

An agent that always exploits the action with the highest observed average reward — with no exploration — can perform suboptimally even if its current best estimate happens to be accurate.

TTrue
FFalse
Question 4 True / False

UCB (Upper Confidence Bound) methods explore by randomly selecting a non-greedy action with a fixed probability, similar to ε-greedy but with a smaller and more carefully tuned exploration rate.

TTrue
FFalse
Question 5 Short Answer

Why is exploration not simply 'wasted effort'? Explain what exploration actually achieves and how its value depends on the situation.

Think about your answer, then reveal below.