PAC stands for Probably Approximately Correct.
It was a very common research area in computer science looking for proof of learnability of certain hypothesis sets.
The usual hypothesis sets were not distributions (like in statistics) but more logical formulas like DNF, CNF, DFA, etc.
In order to prove learnability, you need to show that for every sample distribution, for every concept $h$ in the hypothesis class $H$,  for every $\lambda$ (the probability part) and for every $\epsilon$ (the approximately part) you can find an hypothesis $h^*$ such that with probability at least $1- \lambda$ the disagreement between $h$ and $h^*$ will be smaller than $\epsilon$.
This bar for learning is very high, and some of the research proved negative results about many simple hypothesis sets.
When you try to learn to differentiate between cats and dogs, you don't have a mathematical definition of hypothesis set. Most publications run an algorithm on a dataset and show their result. Almost nobody proves that the algorithms are correct (even given some assumptions). I find the lack of proofs quite sad.
The math behind PAC is actually probability. Let's say that you have two cubes (classifier, concept) and you want to know how correlated they are. So, you toss them together $m$ times and you check the level of disagreement, epsilon. The higher the $m$ and the lower the $\epsilon$, the more confident you are that this is due to correlation and not due to error, $\lambda$. Given this intuition, PAC learning usually works in the other direction. Once you state the desired lambda and epsilon, the algorithm provides the number of samples $m$ needed to reach them.