Good classifiers when having many labels

I am asking myself, if there is another good method than deep artificial neural networks when trying to classify data with many (100) labels. Are there any suggestions? For example, logistic regression does not seem to fit, as - in its basic form, it only supports two labels, does it?

Topic methods classification

Category Data Science


Multinomial logistic regression, supported in SKLearn in Python and VGAM in R, extends binary logistic regression in a way very similar to how softmax in deep learning extends sigmoid.

Instead of considering the response variable as binomial, as logistic regression does, multinomial logistic regression considers the response variable to be multinomial (stunning, I know). The regression coefficients then get fitted using the usual method of maximum likelihood estimation (equivalent to minimizing crossentropy loss), and the model returns the probabilities of each category, same as logistic regression.

The usual machine learning methods have multiclass extensions, however: SVM, random forest, k-nearest neighbors, shallow neural networks, etc. You certainly don’t have to jump straight to deep learning.


There are a large number of possible candidates for multiclass classification. Logit (as multinominal logit) is one option of many.

What type of classifier works best usually is a matter of the problem at hand and depends on many different things (including the amount and nature of the data as well as class balance). To my best knowledge, there is no classifier for which one can say that it generally works well with many classes. However, you may try a random forest or boosting, as tree based estimators are easy to apply and tend to be rather „robust“.

The sklearn page provides a helpful overview of multiclass classifiers supported by sklearn: https://scikit-learn.org/stable/modules/multiclass.html

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.