Why do we need "MultiOutputClassifier" if we can get same results without it?
I am learning about multi-label multi-classification examples
It is when you have a case like this
Year Actor Budget | Genre
------------------------------------------------
2004 Tom C. 40,000,000 | Action, Darama
2016 Mel G. 54,000,000 | Comedy, Action, Family
2021 Eva K. 3,000,000 | Comedy, Romance
I saw an example using MultiOutputClassifier but I do not see a value of using this classifier as models still work without it, without any problem.
Here is the example, you will see that at line (1) which is without MultiOutputClassifier, the results are similar to line (2) with MultiOutputClassifier
so in that case why would anyone use MultiOutputClassifier?
from sklearn.datasets import make_classification
from sklearn.multioutput import MultiOutputClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.utils import shuffle
import numpy as np
import datetime
def time_to_sec(dt):
ms = (dt.days * 24 * 60 * 60 + dt.seconds) * 1000 + dt.microseconds / 1000.0
ms = ms / 1000.0
return ms
X, y1 = make_classification(n_samples=100000, n_features=100, n_informative=30, n_classes=3, random_state=1)
y2 = shuffle(y1, random_state=1)
y3 = shuffle(y1, random_state=2)
Y = np.vstack((y1, y2, y3)).T
n_samples, n_features = X.shape # 10,100
n_outputs = Y.shape[1] # 3
n_classes = 3
forest = RandomForestClassifier(random_state=1)
Tx = datetime.datetime.now()
forest.fit(X, Y).predict(X) # ------------------------------- (1)
Ty = datetime.datetime.now()
Sec1 = time_to_sec(Ty-Tx)
multi_target_forest = MultiOutputClassifier(forest, n_jobs=-1)
Tx = datetime.datetime.now()
multi_target_forest.fit(X, Y).predict(X) # ------------------ (2)
Ty = datetime.datetime.now()
Sec2 = time_to_sec(Ty-Tx)
print(Time spend for line (1) = + str(Sec1))
print(Time spend for line (2) = + str(Sec2))
Topic multi-output multilabel-classification random-forest python
Category Data Science