Why does PAC learning focus on learnability of the hypothesis class and not the target function?
The definition of PAC learning is roughly
An algorithm is a PAC learning algorithm if it given enough data, for any target function, it asymptotically does as well as it possibly could given the functions it's capable of representing.
This definition seems sort of unambitious. In reality, I care more about approximating the target function well in an absolute sense, not just approximating it as well as my hypothesis class can muster. By some kind of no-free-lunch principle, it's probably not possible to be able to do this for all target functions, so we need to be satisfied with algorithms that learn well on a class of target functions. If I were to take a shot at defining a notion of learning, it would look like this:
An algorithm learns a class $\mathcal V$ of random variables in $\mathcal X\times\mathbb R$ iff for any $(X, Y)\in\mathcal V$, then for any $\delta, \epsilon$, given enough i.i.d. samples of $(X, Y)$, the algorithm returns a function $h$ such that $P(|h(X)-Y|\leq\epsilon)1-\delta$.
A class $\mathcal V$ of random variables is learnable iff there exists an algorithm that learns it.
I have two questions:
- Why the emphasis on getting good bounds for approximating the hypothesis class, rather than the target function/distribution, when surely it's the latter we care about?
- Is something like my definition also studied?
Topic pac-learning
Category Data Science