What is PAC learning?

I have seen here but I really cannot realize that.

In this framework, the learner receives samples and must select a generalization function (called the hypothesis) from a certain class of possible functions. The goal is that, with high probability (the "probably" part), the selected function will have low generalization error.

Actually we do that in every machine learning situations and we do latter part for avoiding over-fitting. Why do we call it PAC-learning?

I also have not get the meaning of the math too. Is there anyone that can help?

Topic pac-learning machine-learning

Category Data Science


PAC stands for Probably Approximately Correct. It was a very common research area in computer science looking for proof of learnability of certain hypothesis sets.

The usual hypothesis sets were not distributions (like in statistics) but more logical formulas like DNF, CNF, DFA, etc.

In order to prove learnability, you need to show that for every sample distribution, for every concept $h$ in the hypothesis class $H$, for every $\lambda$ (the probability part) and for every $\epsilon$ (the approximately part) you can find an hypothesis $h^*$ such that with probability at least $1- \lambda$ the disagreement between $h$ and $h^*$ will be smaller than $\epsilon$.

This bar for learning is very high, and some of the research proved negative results about many simple hypothesis sets.

When you try to learn to differentiate between cats and dogs, you don't have a mathematical definition of hypothesis set. Most publications run an algorithm on a dataset and show their result. Almost nobody proves that the algorithms are correct (even given some assumptions). I find the lack of proofs quite sad.

The math behind PAC is actually probability. Let's say that you have two cubes (classifier, concept) and you want to know how correlated they are. So, you toss them together $m$ times and you check the level of disagreement, epsilon. The higher the $m$ and the lower the $\epsilon$, the more confident you are that this is due to correlation and not due to error, $\lambda$. Given this intuition, PAC learning usually works in the other direction. Once you state the desired lambda and epsilon, the algorithm provides the number of samples $m$ needed to reach them.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.