Accessing regression coefficients when using MultiOutputRegressor

I am working on a multioutput (nr. targets: 2) regression task. The original data has a huge dimensionality (pn, i.e. there are far more predictors than observations), hence for the baseline models I chose to experiment with Lasso regression, wrapped in sklearn's MultiOutputRegressor. After optimizing the hyperparameters of the Lasso baseline, I wanted to look into model explainability by retrieving the coef_ of the wrapped Lasso regression model(s), but this doesn't seem to be possible. I'm now wondering how I could look into the model's coefficients and have a better understanding of the predictions it makes.

My idea was to return the estimator with the best hyperparameters from GridSearchCV by setting refit=True. Then, accessing the estimator argument of it which yields MultiOutputRegressor(Lasso()), as intended. Now MultiOutputRegressor also has an estimator argument, accessing it would return Lasso(). Last, Lasso has a coef_ argument that returns the coefficients of the regressor. According to sklearn documentation the shape of the array returned by this coef_ argument is either (n_features,) or (n_targets, n_features), so multioutput regression coefficients seem to be supported.

Sample data and code:

from numpy import logspace
from sklearn.datasets import make_regression
from sklearn.model_selection import GridSearchCV
from sklearn.multioutput import MultiOutputRegressor
from sklearn.linear_model import Lasso

X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)

search = GridSearchCV(
    MultiOutputRegressor(Lasso()), 
    param_grid={'estimator__alpha': logspace(-1,1,3)}, 
    scoring='neg_mean_squared_error', 
    cv=10, 
    n_jobs=-1, 
    refit=True
)

best_model = search.fit(X, y)

print(best_model)

print(best_model.estimator.estimator.coef_)

Topic lasso linear-regression regression machine-learning

Category Data Science


Instead of using the estimator attribute you should be using the best_estimator attribute, after which you can access the underlying estimators of the MultiOutputRegressor using the estimators_ attribute. You can then access the coefficients as follows:

coefficients = [estimator.coef_ for estimator in best_model.best_estimator_.estimators_]

# [array([-0.        , 30.91353913, -0.        , 76.42321339, 93.22724698,
#         -0.        ,  0.        , 86.41714933, 12.34299398, -0.        ]),
#  array([ 0.        , 88.99494183,  0.        ,  8.93482644, 26.63584122,
#         -0.        , -0.        ,  3.19035541, 33.95384004,  0.        ])]

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.