Finding the tightest (smallest) triangle that fits all points

I'm supposed to find an algorithm that, given a bunch of points on the Euclidean plane, I have to return the tightest (smallest) origin centered upright equilateral triangle that fits all the given points inside of it, in a way that if I input some random new point, the algorithm will return $+$ if the point is inside the triangle and $-$ if not. Someone has suggested me to go over all the possible points and find the point with …
Category: Data Science

PAC Learnability - Notation

The following is from Understanding Machine Learning: Theory to Algorithm textbook: Definition of PAC Learnability: A hypothesis class $\mathcal H$ is PAC learnable if there exist a function $m_H : (0, 1)^2 \rightarrow \mathbb{N}$ and a learning algorithm with the following property: For every $\epsilon, \delta \in (0, 1)$, for every distribution $D$ over $X$, and for every labeling function $f : X \rightarrow \{0,1\}$, if the realizable assumption holds with respect to $\mathcal H,D,f$ then when running the learning …
Category: Data Science

Learner Algorithm Time & Sample Complexity

Let $X=R^{2}$. Let $u=\left(\frac{\sqrt{3}}{2},-\frac{1}{2}\right),\ w=\left(-\frac{\sqrt{3}}{2},-\frac{1}{2}\right),\ v=\left(0,1\right)$ and $C=H=\left\{h\left(r\right)=\left\{\left(x_{1},x_{2\ }\right)\ |\left(x_{1},x_{2\ }\right)\cdot u\le4,\ \left(x_{1},x_{2\ }\right)\cdot w\le r,\ \left(x_{1},x_{2\ }\right)\cdot v\le r\right\}\right\}$ for $r>0$, the set of all origin centered upright equilateral triangles. Describe a sample complexity algorithm $L$ that learns $C$ using $H$. State the time and sample complexity of your algorithm and prove it. I was faced with this question in a homework assignment and I'm a bit confused.. My solution is: Let D be our dataset Learner Algorithm: maxDistance …
Category: Data Science

VC Dimension of a Countably Infinite Class

I know that there are many examples of classes where the VC Dimension is finite/infinite even though the size of the class is Uncountably Infinite. However, I could not argue if the VC Dimension of a Countably Infinite class is always finite? (I feel that its size will be "smaller" than the size of a power set of an arbitrarily large set) Any help on this is appreciated.
Category: Data Science

Where does the "deep learning needs big data" rule come from

When reading about deep learning I often come across the rule that deep learning is only effective when you have large amounts of data at your disposal. These statements are generally accompanied by a figure such as this: The example (taken from https://hackernoon.com/%EF%B8%8F-big-challenge-in-deep-learning-training-data-31a88b97b282 ) is attributed to a 'famous slide from Andrew Ng'. Does anyone know what this figure is actually based upon? Is there any research that backs up this claim?
Category: Data Science

Proving that a Hypothesis Class is not PAC-Learnable

I was wondering how one can show that a class of classifiers $H$ is not PAC-learnable (in the realizable case) without using VC-dimensions in the argument? I know how to show PAC-learnability through the PAC requirements. But what I'm not sure how to show that it's not PAC-learnable. Thanks
Topic: pac-learning
Category: Data Science

Disproving or proving claim that if VCdim is "n" then it is possible that a set of smaller size is not shattered

Today in the lecture the lecturer said something I found peculiar, and I felt very inconvenient when I heard it: He claimed, that if the maximal VCdim of some hypothesis class is $n\in\mathbb N$, then it is possible that there is some $i<n$ such that for every subset C of size i the subset C is not shattered. Is his claim true? I thought that we can take some subset of size $i,\forall i\in [n]$of the set C* which satisfies …
Category: Data Science

Why does PAC learning focus on learnability of the hypothesis class and not the target function?

The definition of PAC learning is roughly An algorithm is a PAC learning algorithm if it given enough data, for any target function, it asymptotically does as well as it possibly could given the functions it's capable of representing. This definition seems sort of unambitious. In reality, I care more about approximating the target function well in an absolute sense, not just approximating it as well as my hypothesis class can muster. By some kind of no-free-lunch principle, it's probably …
Topic: pac-learning
Category: Data Science

Why is a lower bound necessary in proofs of VC-dimensions for various examples of hypotheses?

In the book "Foundations of Machine Learning" there are examples of proving the VC dimensions for various hypotheses, e.g., for axis-aligned rectangles, convex polygons, sine functions, hyperplanes, etc. All proofs first derive a lower bound, and then show an upper bound. However, why not just derive the upper bound since the definition of VC dimension only cares about the "largest" set that can be shattered by hypothesis set $\mathcal{H}$? Since all examples ends up with a lower bound matching the …
Category: Data Science

A question on realizable sample complexity

I came across the following exercise, and I just can't seem to crack it: Let $l$ be some loss function such that $l \leq 1$. Let $H$ be some hypothesis class, and let $A$ be a learning algorithm. show that: $m^{\text{stat, r}}_H (\epsilon) = O\left(m^{\text{stat, r}}_H (\epsilon/2, 1/2)\cdot \log(1/\epsilon) + \frac{\log(1/\epsilon)}{\epsilon^2}\right)$ Where $m^{\text{stat, r}}_H (\epsilon)$ is the minimal number $m$ such that for any realizable distribution over training examples $D$ we have that: $$\mathbb{E}_{S \sim D^m}\left[ l_D(A(S)) \right]\leq \epsilon$$ And …
Category: Data Science

Are decision tree algorithms linear or nonlinear

Recently a friend of mine was asked whether decision tree algorithms are linear or nonlinear algorithms in an interview. I tried to look for answers to this question but couldn't find any satisfactory explanation. Can anyone answer and explain the solution to this question? Also, what are some other examples of nonlinear machine learning algorithms?
Category: Data Science

A trick used in Rademacher complexity related Theorem

I am currently working on the proof of Theorem 3.1 in the book "Foundations of Machine Learning" (page 35, First edition), and there is a key trick used in the proof (equation 3.10 and 3.11): $$\begin{align*} &E_{S,S'}\left[\underset{g \in \mathcal{G}}{\text{sup}}\frac{1}{m}\sum_{i=1}^{m} g(z'_i)-g(z_i)\right]=E_{\boldsymbol{\sigma},S,S'}\left[\underset{g \in \mathcal{G}}{\text{sup}}\frac{1}{m}\sum_{i=1}^{m} \sigma_i(g(z'_i)-g(z_i))\right] \\ &\text{where } {\Bbb P}(\sigma_i=1)={\Bbb P}(\sigma_i=-1)=\frac{1}{2} \end{align*}$$ It is also shown in the lecture pdf page 8 in this link: https://cs.nyu.edu/~mohri/mls/lecture_3.pdf This is possible because $z_i$ and $z'_i$ can be swapped. My question is, why can we …
Category: Data Science

Generalization bound (single hypothesis) in "Foundations of Machine Learning"

I have a question about Corollary $2.2$: Generalization bound--single hypothesis in the book "Foundations of Machine Learning" Mohri et al. $2012$. Equation $2.17$ seems to only hold when $\hat{R}_S(h)<R(h)$ in equation $2.16$ because of the absolute operator. Why is this not written in the corollary? Am I missing something important? Thank you very much for reading this question.
Category: Data Science

Meaning of Instance Space and Concept Class, (PAC Learnable)

I'm studying Probably approximately correct learning, and I don't understand what an Instance Space and a Concept is. I have see that wikipedia https://en.wikipedia.org/wiki/Probably_approximately_correct_learning provides various examples, but it's still rather an abstract concept. Could you provide me with an intuitive definition and some tangible examples?
Topic: pac-learning
Category: Data Science

Intuition behind Occam's Learner Algorithm using VC-Dimension

So I'm learning about Occam's Learning algorithm and PAC-Learning where for a given hypothesis space $H$, if we want to have a model/hypothesis $h$ that has an True error of $error_D \leq \epsilon$, with a probability of $(1-\delta)$ for a given probability $\delta$, we need to train it on $m$ examples with $m$ being defined as: $$ m > \frac{1}{2\epsilon^2}\{\log(|H|)+log(\frac{1}{\delta})\}$$ Now, I'm looking for some way to explain the terms of the equation in very simple terms to gain some …
Category: Data Science

Generalization Error Definition

I was reading about PAC framework and faced the definition of Generalization Error. The book defined it as: Given a hypothesis h ∈ H, a target concept c ∈ C, and an underlying distribution D, the generalization error or risk of h is defined by The generalization error of a hypothesis is not directly accessible to the learner since both the distribution D and the target concept c are unknown. However, the learner can measure the empirical error of a …
Category: Data Science

What is PAC learning?

I have seen here but I really cannot realize that. In this framework, the learner receives samples and must select a generalization function (called the hypothesis) from a certain class of possible functions. The goal is that, with high probability (the "probably" part), the selected function will have low generalization error. Actually we do that in every machine learning situations and we do latter part for avoiding over-fitting. Why do we call it PAC-learning? I also have not get the …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.