Comparison of performance of regression models for multi-regression tasks

Question

Comparison of performance of regression models for multi-regression tasks

Mario

2022年3月1日 12:15

I have a sample time-series dataset (23, 14291) a pivot table count for 24hrs for some users. After pre-processing, I have a dataset with (23, 200) shape. I filtered some of the columns/features which don't have a time-series based nature to reach/keep meaningful columns/features by PCA method to keep those with a high amount of data variance or correlation matrix to exclude highly correlated columns/features.

I took advantage of MultiOutputRegressor() and predicted all columns for a certain range of time using base-line available regressor to experiment quality of multi-regressions or Multioutput regression. Then I plot the evaluation of predictions over some metrics as follows:

Questions:

How can I rank the best models and interpret their performance when I see even Linear regression has minimum MSE but negative R2_score ?!

I can't figure out why most regression algorithms have negative R2_score while they have acceptable MSE. So I can interpret or explain their behavior or performance? I already researched when R2_score 0:

I'm a bit confused based on this answer:

they say that its range is [0,1], and they are wrong as it can indeed be negative, although to be significantly negative, the model has to be intentionally bad, and the max is indeed 1.0.

Does it mean that despite error calculation showing fair prediction, due to negative R2_score most regression algorithms have bad performance?

Could the imbalanced nature of data reason this behaviour of models? as they refer here:

Note1: I know that having minimum error and maximum R2_score shows the best performance.

Note2: There is workaround here a comparison of RandomForestRegressor() with MultiOutputRegressor(RandomForestRegressor()) which shows equal score=0.84 in the end.

from sklearn.ensemble import RandomForestRegressor
from sklearn.multioutput import MultiOutputRegressor

regr_multirf = MultiOutputRegressor(RandomForestRegressor(n_estimators=100, max_depth=30, random_state=0)).fit(X_train, y_train)

regr_rf = RandomForestRegressor(n_estimators=100, max_depth=30, random_state=2).fit(X_train, y_train)

Topic model-evaluations multi-output r-squared regression machine-learning

Category Data Science

Comparison of performance of regression models for multi-regression tasks

About