Why do I get an ValueError for an SVR model with RFE, but only when using pipeline?

I am running five different regression models to find the best predicting model for one variable. I am using a Leave-One-Out approach and using RFE to find the best predicting features.

Four of the five models are running fine, but I am running into issues with the SVR. This is my code below:

from numpy import absolute, mean, std
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from sklearn.model_selection import cross_val_score, LeaveOneOut
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error
from sklearn.feature_selection import RFECV
from sklearn.pipeline import Pipeline

# one hot encoding
dataset.Gender.replace(to_replace=['M','F'],value=[1,0],inplace=True)

# select predictors and dependent 
X = dataset.iloc[:,12:]
y = dataset.iloc[:,2]

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X = scaler.fit_transform(X)

First I run LOOCV with all features, this runs fine

## LOOCV with all features
# find number of samples
n = X.shape[0]
# create loocv procedure
cv = LeaveOneOut()
# create model
from sklearn.svm import SVR
regressor = SVR(kernel = 'rbf')
# evaluate model
scores = cross_val_score(regressor, X, y, scoring='neg_mean_squared_error', cv=n)
# force positive
#scores = absolute(scores)

# report performance
print('MSE: %.3f (%.3f)' % (mean(scores), std(scores)))

Next, I want to include RFECV to find the best predicting features for the model, this runs fine for my other regression models.

This is the part of the code where I get the error:

# automatically select the number of features with RFE

# create pipeline
rfe = RFECV(estimator=SVR(kernel = 'rbf'))
model = SVR(kernel = 'rbf')
pipeline = Pipeline(steps=[('s',rfe),('m',model)])
# find number of samples
n = X.shape[0]
# create loocv procedure
cv = LeaveOneOut()
# evaluate model
scores = cross_val_score(pipeline, X, y, scoring='neg_mean_squared_error', cv=n)
# report performance
print('MSE: %.3f (%.3f)' % (mean(scores), std(scores)))

The errors I receive are

ValueError: when `importance_getter=='auto'`, the underlying estimator SVR should have `coef_` or `feature_importances_` attribute. Either pass a fitted estimator to feature selector or call fit before calling transform.

I am not sure what this error means?

Topic rfe svr regression machine-learning

Category Data Science


RFE operates by fitting its estimator and then eliminating the worst feature(s), and recursing. The "worst" feature(s) are determined by using feature importance from the model, by default using either coef_ or feature_importances_ (as noted in the error message). SVR has no such attribute, and indeed does not really come with builtin feature importances, especially with a nonlinear kernel. See also https://stats.stackexchange.com/q/265656/232706

With the estimator being a pipeline, you'd anyway need to give more detail to the RFE on where to get the coefficients, see the second paragraph of the docs for importance_getter:

Also accepts a string that specifies an attribute name/path for extracting feature importance (implemented with attrgetter). For example, give regressor_.coef_ in case of TransformedTargetRegressor or named_steps.clf.feature_importances_ in case of sklearn.pipeline.Pipeline with its last step named clf.

Finally, if you really want to use SVR, have a look at the third paragraph of the docs for importance_getter:

If callable, overrides the default feature importance getter. The callable is passed with the fitted estimator and it should return importance for each feature.

You can write a callable that uses, say, permutation importance (though this will be expensive) or some other agnostic importance measure. Erm, actually, since the callable only gets the fitted estimator, and not the data, permutation importance won't work. See also https://stats.stackexchange.com/q/191402/232706

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.