Questions: Online Learning and Regret Bounds

4 questions to test your understanding

Score: 0 / 4
Question 1 Multiple Choice

After T = 10,000 rounds, an online learner has cumulative regret of 500 compared to the best expert. Is this good performance?

ANo — 500 regret means the learner made 500 more mistakes than the best expert
BYes — regret of 500 over 10,000 rounds means the average per-round regret is 0.05, and since optimal regret scales as sqrt(T) ≈ 100, the learner is performing within a constant factor of optimal
CIt depends entirely on the number of experts — with few experts, 500 is bad; with many, it is good
DRegret of 500 is always considered large regardless of T
Question 2 True / False

Online learning with adversarial data sequences is strictly harder than PAC learning with i.i.d. data — any problem learnable online is also PAC-learnable, but not vice versa.

TTrue
FFalse
Question 3 True / False

O(sqrt(T)) regret is achievable for online convex optimization even against an adversary, but O(1) regret (bounded regret independent of T) is generally not achievable.

TTrue
FFalse
Question 4 Short Answer

Explain why regret, rather than absolute loss, is the natural performance measure in online learning, and what it means for a learner to achieve sublinear regret.

Think about your answer, then reveal below.