Would all classification models perform similarly in a theoretical and ideal scenario?

Imagine that we count on infinite computation power, an infinite amount of data and we have an infinite amount of time to wait for a model to learn. In such a scenario, we want to have some data binary classified.

My question is: would all classification models (we can leave out linear models because they won't be able to learn non-linear boundaries) perform similarly? In other words, are all the (in principle) solvable problems by each (non-linear) classification algorithm the same? You can assume an arbitrary amount of layers and neurons in a neural network, an arbitrary number of trees with arbitrary depths in a random forest, and so on.

I know that this question may not be of use in a realistic, practical world as the one we live in, but I want to know if, in theory, there are any specific obstacles that some models would have that others wouldn't.

Topic theory model-selection classification machine-learning

Category Data Science


First, this ideal scenario is flawed because if there is an infinite amount of labelled data and an infinite amount of time are available, then it would be pointless to build a classifier: for any input instance, one can find its true label in the labelled data sooner or later.

The second problem in this question is related to a common confusion about what makes a ML model useful: its ability to generalize. The goal of a ML model is not to be as accurate as possible for every single instance, it's to extract the general patterns across all the instances. This implies simplifying the data, i.e. ignoring minor variations in order to focus on the big trends. A model which doesn't generalize is a simple collection of instances, it doesn't really learn anything.

Why this matters? Because the more complex a model is, the less it generalizes. For example a decision tree with infinite depth and no minimum number of instances by leaf becomes a collection of instances. It can still predict a new instance, but it's very likely to overfit and to make more errors than a reduced tree. So if one pushes a model to its maximum level of detail, at some point its performance will start decreasing.

To some extent one can see ML training as finding the optimal balance between representing details and generalizing. Different types of models do this in different ways so I don't think there can be any ideal conditions where they all perform the same.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.