in my project I have >900 features and I thought to use Recursive Feature Elimination algorithm to reduce the dimensionality of my problem (in order to improve the accuracy). But I can't figure out how to choose the RFE parameters (estimator and the number of parameters to select). Should I use model selection techniques in this case as well? Do you have any advice?
I am new in machine learning and just learned about feature selection. In my project, I have a dataset with 89% being a majority class and 11% as the minority class. Also, I have 24 features. I opted to use Recursive Feature Elimination with Cross-Validation (RFECV in the scikit-learn package) to find the optimal number of features in the dataset. I also set the 'scoring' parameter to 'f1' since I am dealing with an imbalanced dataset. Furthermore, the estimator I …
I am running five different regression models to find the best predicting model for one variable. I am using a Leave-One-Out approach and using RFE to find the best predicting features. Four of the five models are running fine, but I am running into issues with the SVR. This is my code below: from numpy import absolute, mean, std import matplotlib.pyplot as plt import pandas as pd import seaborn as sns from sklearn.model_selection import cross_val_score, LeaveOneOut from sklearn.metrics import r2_score, …
I am trying a Regression model on a dataset which has categorical and numerical variables along with nan values. I want to use Pipelines for imputation and encoding purposes. Now I have a few conditions which must be satisfied in building the model which are as follows: 1.) Use of Pipelines is a must for imputation and encoding (one hot encoding) purpose. 2.) Imputation should be done AFTER train test split. 3.) For feature selection (should be done AFTER train …
I built an XGB and ran RFECV over 250 features. After an hour or so, I plotted the grid_scores_. All numbers of features are within 0.02, as clearly visible on the y-axis. To me this plot would indicate that it doesn't matter if I have 1 variable or 250 in my model. Which has a lot of scary ramifications. Has anyone ever come across this in their work? If so, what did you determine? How is this even possible?
I am trying to use RFE with Artificial neural nets but I am getting the error that "'Sequential' object has no attribute '_get_tags'" . Here is my code snippet. Any help would be appreciated. model_2 = Sequential([ Dense(9, activation='linear'), Dense(200, activation=tf.keras.layers.LeakyReLU(alpha=0.3)), Dense(1, activation='linear'), ]) adam = keras.optimizers.Adam(lr=0.0001, beta_1=0.9, beta_2=0.999,epsilon=None, decay=0.0, amsgrad=False) model_2.compile(optimizer=adam,loss='mse',metrics=['accuracy']) rfe = RFE(model_2, n_features_to_select=5) pipeline = Pipeline(steps=[('s',rfe),('m',model_2)]) hist_2 = pipeline.fit(X2_train.iloc[:,10:20].values, y2_train.iloc[:,1].to_numpy(), m__batch_size=10, m__epochs=4000, m__validation_data=(X2_test.iloc[:,10:20].values, y2_test.iloc[:,1]))
On a set of 9 features I have applied Recursive Feature Elimination (RFE) algorithm using SVM estimator, following approach from (1). When requesting a subset of size 1 to be found, then RFE returned feature X. However, when I trained SVM over each feature individually, I found another feature Y to have higher accuracy than SVM trained over X. I thought that RFE finds features with the highest accuracy. Is my understanding of RFE wrong? (1): Gene selection for cancer …