Why do we need "MultiOutputClassifier" if we can get same results without it?

Question

Why do we need "MultiOutputClassifier" if we can get same results without it?

asmgx

2022年2月13日 10:18

I am learning about multi-label multi-classification examples

It is when you have a case like this

Year   Actor     Budget      |   Genre
------------------------------------------------
2004   Tom C.    40,000,000  |   Action, Darama
2016   Mel G.    54,000,000  |   Comedy, Action, Family
2021   Eva K.    3,000,000   |   Comedy, Romance

I saw an example using MultiOutputClassifier but I do not see a value of using this classifier as models still work without it, without any problem.

Here is the example, you will see that at line (1) which is without MultiOutputClassifier, the results are similar to line (2) with MultiOutputClassifier

so in that case why would anyone use MultiOutputClassifier?

from sklearn.datasets import make_classification
from sklearn.multioutput import MultiOutputClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.utils import shuffle
import numpy as np
import datetime



def time_to_sec(dt):
   ms = (dt.days * 24 * 60 * 60 + dt.seconds) * 1000 + dt.microseconds / 1000.0
   ms = ms / 1000.0
   return ms


X, y1 = make_classification(n_samples=100000, n_features=100, n_informative=30, n_classes=3, random_state=1)
y2 = shuffle(y1, random_state=1)
y3 = shuffle(y1, random_state=2)
Y = np.vstack((y1, y2, y3)).T
n_samples, n_features = X.shape # 10,100
n_outputs = Y.shape[1] # 3
n_classes = 3


forest = RandomForestClassifier(random_state=1)
Tx = datetime.datetime.now()
forest.fit(X, Y).predict(X)  # ------------------------------- (1)
Ty = datetime.datetime.now()
Sec1 = time_to_sec(Ty-Tx)

multi_target_forest = MultiOutputClassifier(forest, n_jobs=-1)
Tx = datetime.datetime.now()
multi_target_forest.fit(X, Y).predict(X)  # ------------------ (2)
Ty = datetime.datetime.now()
Sec2 = time_to_sec(Ty-Tx)

print(Time spend for line (1) =  + str(Sec1))
print(Time spend for line (2) =  + str(Sec2))

Topic multi-output multilabel-classification random-forest python

Category Data Science

Peter · Accepted Answer · 2022年2月13日 10:18

The docs say:

This strategy consists of fitting one classifier per target. This is a simple strategy for extending classifiers that do not natively support multi-target classification.

So using the convenience function sklearn.multioutput.MultiOutputClassifier allows you to use classifiers with and without multi-target support in the same pipeline. One example of a classifier which does not support multi-target is xgboost. Often it is possible to combine sklearn functions (like the one discussed here) with other algorithms.

Why do we need "MultiOutputClassifier" if we can get same results without it?

About