BorutaShap implementation

I want to use BorutaShap for feature selection in my model. I have my train_x as an numpy.ndarray and I want to pass it to the BorutaShap instance. When I try to fit I am getting error as: AttributeError: 'numpy.ndarray' object has no attribute 'columns' Below is my code:- num_trans = Pipeline(steps = [('impute', SimpleImputer(strategy = 'mean')), ('scale', StandardScaler())]) cat_trans = Pipeline(steps = [('impute', SimpleImputer(strategy = 'most_frequent')), ('encode', OneHotEncoder(handle_unknown = 'ignore'))]) from sklearn.compose import ColumnTransformer preproc = ColumnTransformer(transformers = [('cat', …
Category: Data Science

Trouble performing feature selection using boruta and support vector regression

I was trying to select the most important features of a data set using Boruta in python. I have split the data into training and test set. Then I used SVM regressor to fit the data. Then I used Boruta to measure feature importance.The code is as follows: from sklearn.svm import SVR svclassifier = SVR(kernel='rbf',C=1e4, gamma=0.1) svm_model= svclassifier.fit(x_train, y_train) from boruta import BorutaPy feat_selector = BorutaPy(svclassifier, n_estimators='auto', verbose=2, random_state=1) feat_selector.fit(x_train, y_train) feat_selector.support_ feat_selector.ranking_ X_filtered = feat_selector.transform(x_train) But I get this …
Category: Data Science

From logistic regression to XGBoost - selecting features to run the model with

I have been asked to look at XGBoost (as implemented in R, and with a maximum of around 50 features) as an alternative to an already existing but not developed by me logistic regression model created from a very large set of credit risk data containing a few thousand predictors. The documentation surrounding the logistic regression is very well prepared, and as such track has been kept of the reasons for exclusion of each variable. Among those are: automated data …
Category: Data Science

The Merits of Feature Reduction Routines

I am interested in learning what routine others use (if any) for Feature Reduction/Selection. For example, If my data has several thousand features, I typically try {2,3,4} things right away depending on circumstances. Zero variance/Near zero variance Using R package caret, nzv I find a v.small percentage is zero variance and a few more are near zero variance. Then by using nzv$PercentUnique I may remove the bottom quartile of features depending on the range of PercentUnique's. Correlation to find multicollinearity …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.