What is the exact definition of VC dimension?
I'm studying machine learning from Andrew Ng Stanford lectures and just came across the theory of VC dimensions. According to the lectures and what I understood, the definition of VC dimension can be given as,
If you can find a set of $n$ points, so that it can be shattered by the classifier (i.e. classify all possible $2^n$ labeling correctly) and you cannot find any set of $n+1$ points that can be shattered (i.e. for any set of $n+1$ points there is at least one labeling order so that the classifier can not separate all points correctly), then the VC dimension is $n$.
Also Professor took an example and explained this nicely. Which is:
Let,
$H=\{{set\ of\ linear\ classifiers\ in\ 2\ Dimensions \}}$
Then any 3 points can be classified by $H$ correctly with separating hyper plane as shown in the following figure.
And that's why the VC dimension of $H$ is 3. Because for any 4 points in 2D plane, a linear classifier can not shatter all the combinations of the points. For example,
For this set of points, there is no separating hyper plane can be drawn to classify this set. So the VC dimension is 3.
I get the idea till here. But what if we've following type of pattern?
Or the pattern where a three points coincides on each other, Here also we can not draw separating hyper plane between 3 points. But still this pattern is not considered in the definition of the VC dimension. Why? The same point is also discussed the lectures I'm watching Here at 16:24 but professor does not mention the exact reason behind this.
Any intuitive example of explanation will be appreciated. Thanks
Topic vc-theory classification machine-learning
Category Data Science