Generalization bound (single hypothesis) in "Foundations of Machine Learning"

I have a question about Corollary $2.2$: Generalization bound--single hypothesis in the book "Foundations of Machine Learning" Mohri et al. $2012$.

Equation $2.17$ seems to only hold when $\hat{R}_S(h)R(h)$ in equation $2.16$ because of the absolute operator. Why is this not written in the corollary? Am I missing something important?

Thank you very much for reading this question.

Topic pac-learning machine-learning

Category Data Science


You are right. The relaxed inequality $$R(h) \le \hat{R}_S(h)+ \epsilon.$$ can be replaced with the complete inequality $$\left |\hat{R}_S(h) - R(h) \right| \le \epsilon.$$ Actually, authors use this complete inequality for the follow up examples in the book. Again in Theorem $2.13$, they write the relaxed inequality, but prove for the complete inequality.

We could say that the relaxed inequality is written for the sake of readability and/or convention.

On the relation of inequalities

Let us denote: $$A:=\hat{R}_S(h) - R(h) \le \epsilon$$ $$B:=\hat{R}_S(h) - R(h) \ge -\epsilon$$ thus, $$\left| \hat{R}_S(h) - R(h) \right| \le \epsilon = A \text{ and } B$$ Equation $(2.16)$ states: $$\begin{align*} & {\Bbb P}(\left| \hat{R}_S(h) - R(h) \right| \ge \epsilon) \le \delta \\ & \Rightarrow {\Bbb P}(\left| \hat{R}_S(h) - R(h) \right| \le \epsilon) \ge 1 - \delta \\ & \Rightarrow {\Bbb P}(A \text{ and } B) \ge 1 - \delta \\ \end{align*}$$ knowing that ${\Bbb P}(B) \ge {\Bbb P}(A \text{ and } B)$, $$\begin{align*} & {\Bbb P}(B) \ge {\Bbb P}(A \text{ and } B) \ge 1 - \delta \\ & \Rightarrow {\Bbb P}(\hat{R}_S(h) - R(h) \ge -\epsilon) \ge 1 - \delta \end{align*}$$ which is equivalent to $$R(h) \le \hat{R}_S(h) + \epsilon$$ with probability at least $1-\delta$, i.e. equation $(2.17)$.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.