5 questions to test your understanding
A learning algorithm is given 500 training examples drawn i.i.d. from an unknown distribution and outputs a hypothesis with 3% test error. A colleague claims this proves the concept class is PAC-learnable. What is wrong with this claim?
In the PAC framework, why does the sample complexity bound depend on 1/epsilon and 1/delta rather than on epsilon and delta directly?
PAC learning requires that the learning algorithm succeed for any distribution over the input space, including adversarially chosen ones.
A concept class that requires exponential time to learn but only polynomial samples is considered PAC-learnable.
Explain the role of the 'probably' and 'approximately' components in PAC learning and why both relaxations are necessary.