Accessing regression coefficients when using MultiOutputRegressor
I am working on a multioutput (nr. targets: 2) regression task. The original data has a huge dimensionality (pn
, i.e. there are far more predictors than observations), hence for the baseline models I chose to experiment with Lasso
regression, wrapped in sklearn's MultiOutputRegressor
. After optimizing the hyperparameters of the Lasso
baseline, I wanted to look into model explainability by retrieving the coef_
of the wrapped Lasso regression model(s), but this doesn't seem to be possible. I'm now wondering how I could look into the model's coefficients and have a better understanding of the predictions it makes.
My idea was to return the estimator with the best hyperparameters from GridSearchCV
by setting refit=True
. Then, accessing the estimator argument of it which yields MultiOutputRegressor(Lasso())
, as intended. Now MultiOutputRegressor
also has an estimator
argument, accessing it would return Lasso()
. Last, Lasso
has a coef_
argument that returns the coefficients of the regressor. According to sklearn documentation the shape of the array returned by this coef_
argument is either (n_features,)
or (n_targets, n_features)
, so multioutput regression coefficients seem to be supported.
Sample data and code:
from numpy import logspace
from sklearn.datasets import make_regression
from sklearn.model_selection import GridSearchCV
from sklearn.multioutput import MultiOutputRegressor
from sklearn.linear_model import Lasso
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)
search = GridSearchCV(
MultiOutputRegressor(Lasso()),
param_grid={'estimator__alpha': logspace(-1,1,3)},
scoring='neg_mean_squared_error',
cv=10,
n_jobs=-1,
refit=True
)
best_model = search.fit(X, y)
print(best_model)
print(best_model.estimator.estimator.coef_)
Topic lasso linear-regression regression machine-learning
Category Data Science