After T = 10,000 rounds, an online learner has cumulative regret of 500 compared to the best expert. Is this good performance?
ANo — 500 regret means the learner made 500 more mistakes than the best expert
BYes — regret of 500 over 10,000 rounds means the average per-round regret is 0.05, and since optimal regret scales as sqrt(T) ≈ 100, the learner is performing within a constant factor of optimal
CIt depends entirely on the number of experts — with few experts, 500 is bad; with many, it is good
DRegret of 500 is always considered large regardless of T
The benchmark is O(sqrt(T)) regret, which for T = 10,000 is about 100 (times a constant depending on the number of experts and the loss range). Regret of 500 is within a constant factor of this optimal rate. More importantly, the average regret per round is 500/10,000 = 0.05, which goes to zero — meaning the learner is converging to the performance of the best expert. In online learning, sublinear regret (o(T)) is the goal, and sqrt(T) is the achievable rate. Linear regret (proportional to T) would mean the learner is systematically worse and not improving.
Question 2 True / False
Online learning with adversarial data sequences is strictly harder than PAC learning with i.i.d. data — any problem learnable online is also PAC-learnable, but not vice versa.
TTrue
FFalse
Answer: False
The relationship is more nuanced than a simple hierarchy. Online learning and PAC learning are different frameworks with different guarantees. Some problems are learnable in both settings, some in one but not the other. Online learning does not require distributional assumptions (the adversary can choose any sequence), which makes it harder in one sense. But online learning only competes with the best fixed hypothesis in hindsight, while PAC learning guarantees low absolute error. For binary classification, online learnability (finite Littlestone dimension) is strictly more restrictive than PAC learnability (finite VC dimension) — there exist classes that are PAC-learnable but not online-learnable. So the relationship runs the other direction for classification.
Question 3 True / False
O(sqrt(T)) regret is achievable for online convex optimization even against an adversary, but O(1) regret (bounded regret independent of T) is generally not achievable.
TTrue
FFalse
Answer: True
The sqrt(T) regret rate is both achievable and optimal (up to constants) for most online learning problems with adversarial sequences. Achieving O(1) regret would mean the total excess loss is bounded regardless of how long the game continues — this is too much to ask against an adversary who can always present difficult examples. The sqrt(T) rate means the average regret per round is O(1/sqrt(T)), which goes to zero but the total grows without bound. Sublinear regret (o(T)) is the achievability boundary: it means the learner is 'no-regret' in the sense that the per-round average vanishes, even though the cumulative regret grows.
Question 4 Short Answer
Explain why regret, rather than absolute loss, is the natural performance measure in online learning, and what it means for a learner to achieve sublinear regret.
Think about your answer, then reveal below.
Model answer: In online learning, an adversary chooses the loss sequence, so no learner can guarantee low absolute loss — the adversary can make every outcome costly. Regret measures performance relative to the best fixed strategy in hindsight, which is a fair benchmark because it asks: 'how much worse did you do compared to the best you could have done knowing everything in advance?' Sublinear regret (regret growing slower than T, typically as sqrt(T)) means the learner's average per-round performance converges to the best fixed strategy's. With regret R(T) = O(sqrt(T)), the average regret R(T)/T = O(1/sqrt(T)) -> 0, so the learner eventually matches the best expert's average loss rate. This is the strongest guarantee possible without distributional assumptions — you cannot guarantee being as good as the best expert on every round, but you can guarantee your average converges to theirs.
The regret framework also connects to game theory (minimax strategies), optimization (online gradient descent achieves sqrt(T) regret for convex losses), and statistics (prediction with expert advice). This universality makes it a foundational concept across theoretical computer science.