Trouble performing feature selection using boruta and support vector regression

I was trying to select the most important features of a data set using Boruta in python. I have split the data into training and test set. Then I used SVM regressor to fit the data. Then I used Boruta to measure feature importance.The code is as follows:

from sklearn.svm import SVR

svclassifier = SVR(kernel='rbf',C=1e4, gamma=0.1)
svm_model= svclassifier.fit(x_train, y_train)

from boruta import BorutaPy

feat_selector = BorutaPy(svclassifier, n_estimators='auto', verbose=2, random_state=1)
feat_selector.fit(x_train, y_train)
feat_selector.support_
feat_selector.ranking_
X_filtered = feat_selector.transform(x_train)

But I get this error KeyError: 'max_depth'.

What might be causing this error?

Does Boruta work with any kind of models? i.e linear models, tree-based models, neural nets, etc.?

Topic boruta svr feature-selection python

Category Data Science


Without the full code, there are few things to check but I would try as follows:

Note that you are passing svm_model to BorutaPy and according to your code you should pass the fitted object that is svm_model

from sklearn.svm import SVR

svclassifier = SVR(kernel='rbf',C=1e4, gamma=0.1)
svm_model= svclassifier.fit(x_train, y_train)

from boruta import BorutaPy

feat_selector = BorutaPy(svm_model, n_estimators='auto', verbose=2, random_state=1)
feat_selector.fit(x_train, y_train)
feat_selector.support_
feat_selector.ranking_
X_filtered = feat_selector.transform(x_train)

EDIT:

After reading the implementation of Boruta and since this is based on tree models.

 Parameters
    ----------
    estimator : object
        A supervised learning estimator, with a 'fit' method that returns the
        feature_importances_ attribute. Important features must correspond to
        high absolute values in the feature_importances_.

SVM does not have the attribute feature_importances_, so Boruta can only receive tree models like DecisionTrees, RandomForest, XGB, etc.

If you want to use SVM anyway I would recommend to change the feature selection algorithm to PermutationImportance, which is quite similar way of computing importance base on random repeated permutation, but in this case you will have to provide a metric to measure the decrease on performance when a feature is shuffled.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.