A question on realizable sample complexity

I came across the following exercise, and I just can't seem to crack it:

Let $l$ be some loss function such that $l \leq 1$. Let $H$ be some hypothesis class, and let $A$ be a learning algorithm. show that:

$m^{\text{stat, r}}_H (\epsilon) = O\left(m^{\text{stat, r}}_H (\epsilon/2, 1/2)\cdot \log(1/\epsilon) + \frac{\log(1/\epsilon)}{\epsilon^2}\right)$

Where $m^{\text{stat, r}}_H (\epsilon)$ is the minimal number $m$ such that for any realizable distribution over training examples $D$ we have that: $$\mathbb{E}_{S \sim D^m}\left[ l_D(A(S)) \right]\leq \epsilon$$

And where $m^{\text{stat, r}}_H (\epsilon, \delta)$ is the minimal number $m$ such that for any realizable distribution over training examples $D$ we have that: $$P_{S \sim D^m}\left( l_D(A(S)) \geq \epsilon \right) \leq \delta$$

Thanks a lot in advance!

Topic vc-theory theory pac-learning machine-learning

Category Data Science


We want to prove:

If H is PAC learnable, then $\forall \epsilon, \exists C, \forall m \geq m_2:=Clog(1/\epsilon)(m_1+1/\epsilon^2), E[L] \leq \epsilon \mbox{ (a)}$
where $m_1:=m(\epsilon/2,1/2)$

Since $L \leq 1$, we have $E[L] \leq 1$. So proof is trivial for $\epsilon \geq 1$. Let $\epsilon \in (0, 1)$.

Find an equivalence for $(a)$:

$\begin{align*} E[L] &= \int_{l \geq \epsilon / 2}ldP + \int_{l< \epsilon / 2} ldP \leq \int_{l \geq \epsilon / 2}dP + \int_{l< \epsilon / 2} \frac{\epsilon}{2} dP\\ &= P(L \geq \epsilon/2) + \frac{\epsilon}{2} P(L <\epsilon/2)\\ &= (1 - \epsilon/2)P(L \geq \epsilon/2) + \epsilon/2 < \epsilon \Leftrightarrow P(L \geq \epsilon/2) < \epsilon/(2 - \epsilon) \end{align*}$

Therefore, if

$\forall \epsilon, \forall m \geq m_3:=m(\epsilon/2, \epsilon/(2 - \epsilon)), P(L \geq \epsilon/2) < \epsilon/(2 - \epsilon)\mbox{ (b)}$ holds,

$\forall \epsilon, \forall m \geq m(\epsilon)=m_3, E[L] \leq \epsilon \mbox{ (c)}$ holds too (and vice versa)

Prove $(b) \Rightarrow (a)$:

Using The Fundamental Theorem of Statistical Learning for PAC learnable H with VC dimension $d$, we have:

$\begin{align*} (&\epsilon/2, 1/2)\mbox{-learnable H with } m_1 \Leftrightarrow \exists C_1 > 0, m_1 \geq C_1\frac{d + log(2)}{\epsilon/2}\\ &\Leftrightarrow log(1/\epsilon)(m_1 + 1/\epsilon^2) \geq \frac{log(1/\epsilon)(C_1d + C_1log(2) + 1/(2\epsilon))}{\epsilon/2} \end{align*}$

which uses $1/\epsilon > 1$ and $log(1/\epsilon) > 0$.

Now we use an inequality without proof (plot the function here)

$\forall x > 1, \forall d, C_1 \geq 0, \exists C_2 > 0, f(x)=\frac{log(x)(C_1d+C_1log(2)+x/2)}{dlog(2x)+log(2x-1)} \geq C_2$

Setting $x=1/\epsilon$, we continue as:

$\begin{align*} &... \overset{\exists C_2}{\geq} C_2\frac{dlog(2/\epsilon) + log((2-\epsilon)/\epsilon)}{\epsilon/2} \overset{\exists C_3}{\geq} \frac{1}{C_3} m_3 \Leftrightarrow (\epsilon/2, \epsilon/(2-\epsilon))\mbox{-learnable H with }m_3 \end{align*}$

By setting $m_2 := C_3log(1/\epsilon)(m_1 + 1/\epsilon^2)$, we have $m_2 \geq m_3$, thus

$\begin{align*} &\forall \epsilon, \forall m \geq m_3, P(L \geq \epsilon/2) < \epsilon/(2 - \epsilon)\\ &\Rightarrow \forall \epsilon, \exists C, \forall m \geq m_2 := Clog(1/\epsilon)(m_1 + 1/\epsilon^2) \geq m_3, P(L \geq \epsilon/2) < \epsilon/(2 - \epsilon)\\ &\Rightarrow \forall \epsilon, \exists C, \forall m \geq m_2 := Clog(1/\epsilon)(m_1 + 1/\epsilon^2), E[L] < \epsilon \end{align*}$.

Proof is complete.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.