BorutaShap implementation

I want to use BorutaShap for feature selection in my model. I have my train_x as an numpy.ndarray and I want to pass it to the BorutaShap instance. When I try to fit I am getting error as:

AttributeError: 'numpy.ndarray' object has no attribute 'columns'

Below is my code:-

num_trans = Pipeline(steps = [('impute', SimpleImputer(strategy = 
                               'mean')), 
                          ('scale', StandardScaler())])
cat_trans = Pipeline(steps = [('impute', SimpleImputer(strategy = 
                               'most_frequent')), 
                          ('encode', OneHotEncoder(handle_unknown = 
                           'ignore'))])

from sklearn.compose import ColumnTransformer

preproc = ColumnTransformer(transformers = [('cat', cat_trans, 
                                             cat_cols), ('num', 
                                            num_trans, num_cols)])

X = preproc.fit_transform(train_data1)
X_final = preproc.transform(test_data1)

from xgboost import XGBRegressor
xgbr_model = XGBRegressor(random_state = 69, tree_method = 'gpu_hist')

from sklearn.model_selection import train_test_split, cross_val_score
train_x, test_x, train_y, test_y = train_test_split(X, y, test_size = 
                                               0.2, random_state = 69)

from BorutaShap import BorutaShap
Feature_Selector = BorutaShap(model=xgbr_model,
                              importance_measure='shap', 
                              classification=False)

Feature_Selector.fit(train_x, train_y, n_trials=10, random_state=69)

Any help will be appreciated!

Topic boruta shap feature-selection python

Category Data Science


BorutaShap is looking for input that has the columns attribute. Try converting your data to a Pandas dataframe.

If you try that, you'll likely also discover that BorutaShap wants your data's columns attribute elements to be in string format.

import pandas as pd
data = pd.DataFrame(data)
data.columns = [str(i) for i in data.columns] 
# or use the actual feature names if you have them
# data.columns = ['feature 1', 'feature 2', ...]

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.