Why Should There Be Multiple Columns in Train Labels for One Model?
Going through the notebook on well known kaggle competition of favorita sales forecasting.
One puzzle is, after the data is split for train and testing, it seems y_train
has two columns containing unit_sales and transactions, both of which are being predicted, and eventually compared with ground truths.
But why would someone pass these two columns to one model.fit()
call instead of developing two models to predict the columns? Or is that what sklearn
does internally anyway, i.e. training two models with one fit
call? If not, to me it seems just one model for both will give suboptimal results, as the model can be confused between two distinct labels for each data point, and would not know which value to aim for in its weight updates.
Please correct me if I have any misunderstanding of the scenario.
Topic goodness-of-fit loss-function supervised-learning scikit-learn
Category Data Science