Vapnik Chervonenkis dimension of a classifier from the Wikipedia page

Question

Vapnik Chervonenkis dimension of a classifier from the Wikipedia page

IntegrateThis

2021年7月5日 19:37

The Vapnik Chervonenkis dimension is defined by the wikipedia page here for a classification model as:

A classification model $f$ with some parameter vector $\theta$ is said to shatter a set of data points $(x_1, \ldots x_n)$ if, for all assignments of labels to those points, there exists some $\theta$ such that the model $f$ makes no errors when evaluating that set of data points.

I am trying to understand the second example. Suppose I have a model $f$ with only one parameter $\theta$, and two points $\{a,b\}$ where $a,b \in \mathbb{R}$ and $ab$. Then there are four possible labellings: $\{a = 1, b=1\}, \{a = 0, b=1\}, \{a = 1, b=0\}, \{a = 0, b=0\}$

Is the author saying that I have to choose one value of $\theta$ that would correctly classify the $a,b$ for each of these scenarios? Or rather that I can choose a different $\theta$ for each possible labelling.

For instance in the case $ t a b$ where $a,b$ are both labelled $1$, then by simply classifying points above $t$ as positive, I would correctly classify both examples. However this scheme would fail for the case where $ a = 1, b=0$.

Any insights appreciated.

Topic vc-theory classification

Category Data Science

Sharov · Accepted Answer · 2021年7月5日 19:37

Here is good definition of VC-dimension -- https://www.cs.hmc.edu/~yjw/teaching/cs158/lectures/21_VCDimension.pdf

Quotation from link above:

To show that hypothesis class has VC-dimension d in input space $\chi$, consider this adversarial "shattering game":

We choose d points in $\chi$ positioned however we want;

Adversary labels these d points;

We choose a hypothesis $h \in H$ that separates the points;

The VC-dimension of $H$ in $\chi$ is the maximum $d$ we can choose so that we always succeed.

This basically means that for each labels you can select some hypothesis that classify these labels correctly. Another labeling -- another hypothesis.

Vapnik Chervonenkis dimension of a classifier from the Wikipedia page

About