Is there any way to explicitly measure the complexity of a Machine Learning Model in Python

I'm interested in model debugging and one of the points that it mentions is to compare your model with a less complex one to check if the performance is substantially better on the most complex model as compared with the simpler.

So, it raises my questions:

Suppose you have a Ensemble model and a Linear model for a classification task It seems natural to think that the ensemble model is more complex than the linear model

  1. What would it be a model's agnostic way to numerically measure its complexity to compare two or more models in such terms?

  2. Is there any python/R implementation that can help with such a task?

Topic model-selection python predictive-modeling r machine-learning

Category Data Science


One option is Bayesian information criterion (BIC) which is a model selection criterion that attempts to rewards modeling fit, as measured by maximizing likelihood, while penalizing the number of parameters.

One implementation of BIC is in the RegscorePy package.


1. But, what would it be a way to numerically measure the model's complexity in order to be able to compare two or more models in such terms?

You can use the VC dimension to measure the complexity of a model in a numerical format. See Vapnik–Chervonenkis dimension on Wikipedia.

2. Is there any python implementation that can help with such a task?

There is already a stack exchange link that explains about VC dimension. How to calculate VC-dimension?


I have not heard of any model agnostic way to measure model complexity. There are several strategies but they are model dependant.

You can tackle the problem using different families of models.

  • For linear models you can count the number of nonzero parameters that is using. Number of features used for the prediction.

  • For decision tree you can count the maximum depth that the tree achieves.

  • For Neural Networks you can count the number of parameters that your NN is optimizing.

  • For ensemble methods (random forest, gradient boosting) you can use an aggregation of the different weak learners used in the model.

For the python implementation there are several implementations depending on for which model you want to measure it. Some of them if you notice are really easy to measure.

Its intuitively hard to compare complexity between different model families. What is more complex a linear regression with 4 coefficients or a decision tree with max_depth=3?

On the topic of deep learning complexity, Hinton, Oriol, Jeff Dean published a paper Distilling the knowledge of a Neural Network. Where they talk about simplifying the complexity of a Neural Network.


As mentioned by other answers here, when we talk about model complexity we are usually thinking about the number of parameters the model learns. When someone talks about comparing to a less complex model, they often mean comparing to an intuitively less complex model (either a model in the same class, e.g. a neural network with fewer neurons, or a model from a simpler class, e.g. a linear model rather than a random forest).

One way to think about model complexity between very different models is Kolmogorov Complexity, and you can approximate this by looking at the amount of space occupied by your saved (e.g. pickled) models. In the example you gave, the ensemble would occupy more disk space than the linear model, unless the ensemble was simpler than the linear model (e.g. an ensemble of two linear models with 10 learned coefficients each versus a linear model with 200 learned coefficients).


As you probably know, "complexity" is a loaded term in computer science. Normally, complexity is measured in "big-O notation" and has to do with how solutions scale in time as the number of inputs grows. For example, this post discusses the computational complexity of convolutional layers.

In deep learning, however, competing neural network architectures are generally applying the same algorithm (back-propagation) to the same types of problems (e.g., ImageNet classification); the only difference is the architecture. Further, most architectures use similar computational elements (e.g., convolutional layers and linear layers). Thus, it is a convention to use the number of parameters as a stand-in for complexity. It is true that this is only an approximation: two networks may have the same number of parameters but require different numbers of operations. But it's generally a good approximation, given that different architectures generally have the similarities noted above, but can have sizes that differ by several orders of magnitude.

As a reference, consider Figure 1 in the EfficientNet Paper. They use the number of trainable parameters as a stand-in for "model size" and note that the number of parameters is more-or-less linearly correlated with runtime.

As for a Python function that counts the number of trainable parameters, this will depend whether you are using Keras, Tensorflow, PyTorch, etc. In Keras, this is one line: model.count_params(). In PyTorch, you can calculate it from model.parameters() as discussed here.


It's perhaps a bit naive but the first idea that comes to mind is to simply count the number of parameters which have to be estimated during training: the more values need to be estimated, the more complex the model is, since the hypothesis space is larger. For example a linear model needs only $n+1$ parameters (with $n$ the number of features), whereas the number of parameters in an ensemble model needs is the sum of the numbers of parameters for every learner, so it's likely to be higher. This idea could be refined to take into account the range of values of a parameter.

As a very rough approximation, one could simply calculate the size of the object which represents the model in python (assuming the representation of the model is space-efficient, it might not always be the case).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.