I want to use BorutaShap for feature selection in my model. I have my train_x as an numpy.ndarray and I want to pass it to the BorutaShap instance. When I try to fit I am getting error as: AttributeError: 'numpy.ndarray' object has no attribute 'columns' Below is my code:- num_trans = Pipeline(steps = [('impute', SimpleImputer(strategy = 'mean')), ('scale', StandardScaler())]) cat_trans = Pipeline(steps = [('impute', SimpleImputer(strategy = 'most_frequent')), ('encode', OneHotEncoder(handle_unknown = 'ignore'))]) from sklearn.compose import ColumnTransformer preproc = ColumnTransformer(transformers = [('cat', …
I was trying to select the most important features of a data set using Boruta in python. I have split the data into training and test set. Then I used SVM regressor to fit the data. Then I used Boruta to measure feature importance.The code is as follows: from sklearn.svm import SVR svclassifier = SVR(kernel='rbf',C=1e4, gamma=0.1) svm_model= svclassifier.fit(x_train, y_train) from boruta import BorutaPy feat_selector = BorutaPy(svclassifier, n_estimators='auto', verbose=2, random_state=1) feat_selector.fit(x_train, y_train) feat_selector.support_ feat_selector.ranking_ X_filtered = feat_selector.transform(x_train) But I get this …
I have been asked to look at XGBoost (as implemented in R, and with a maximum of around 50 features) as an alternative to an already existing but not developed by me logistic regression model created from a very large set of credit risk data containing a few thousand predictors. The documentation surrounding the logistic regression is very well prepared, and as such track has been kept of the reasons for exclusion of each variable. Among those are: automated data …
I am interested in learning what routine others use (if any) for Feature Reduction/Selection. For example, If my data has several thousand features, I typically try {2,3,4} things right away depending on circumstances. Zero variance/Near zero variance Using R package caret, nzv I find a v.small percentage is zero variance and a few more are near zero variance. Then by using nzv$PercentUnique I may remove the bottom quartile of features depending on the range of PercentUnique's. Correlation to find multicollinearity …