Is a data set considered to be linearly separable if it can only be separated by multiple hyperplanes?
For example, on the linear separability Wikipedia article, the following example is given:
They say "The following example would need two straight lines and thus is not linearly separable".
On the other hand, in Bishop's 'Pattern Recognition and Machine Learning' book, he says "Data sets whose classes can be separated exactly by linear decision surfaces are said to be linearly separable".
Under Bishop's definition of linear separability, I think the Wikipedia example would be linearly separable, even though the author of this Wikipedia article says otherwise. This is because Bishop says that we can use multiple linear decision surfaces (hyperplanes) to separate the classes, and it's still considered to be linearly separable data. Bishop implies this by referring to linear decision surfaces, plural, not singular.
Logically, I agree with Bishop. After all, the classes in the Wikipedia example are being separated by linear decision surfaces. So how can one then turn around and say the data set isn't linearly separable? Well, perhaps you could enforce the rule that a data set is only linearly separable if we can separate $N$ classes with $N-1$ decision surfaces. But why would you define linear separability in this way?
So, is the Wikipedia example linearly separable or not?
Topic linearly-separable classification dataset machine-learning
Category Data Science