feature-importances

Why is XGBClassifier in Python outputting different feature importance values with the same data across different repetitions?

user15733888

2022年5月31日 09:01

I am fitting an XGBClassifier to a small dataset (32 subjects) and find that if I loop through the code 10 times the feature importances (gain) assigned to the features in the model varies slightly. I am using the same hyperparameter values between each iteration, and have subsample and colsample set to the default of 1 to prevent any random variation between executions. I am using the scikit learn feature_importance_ function to extract the values from the fitted model. Any …

Topic: feature-importances xgboost feature-selection python machine-learning

Category: Data Science

Feature importance of a linear regression

NAS

2022年5月19日 06:10

What is the easiest and easy to explain feature importance calculation for linear regression? I know I can use Shap to compute feature importance, but I find it difficult to explain it to stakeholders, and the coefficient is not a good measure of feature importance since it depends on the scale of the feature. Some suggested (standard deviation of feature)*feature coefficient as a good measure of feature importance.

Topic: feature-importances linear-models linear-regression statistics machine-learning

Category: Data Science

Interpreting the variance of feature importance outputs with each random forest run using the same parameters

Detr4

2022年5月13日 18:46

I noticed that I am getting different feature importance results with each random forest run even though they are using the same parameters. Now, I know that a random forest model takes observations randomly which is causing the importance levels to vary. This is especially shown for the less important variables. My question is how does one interpret the variance in random forest results when running it multiple times? I know that one can reduce the instability level of results …

Topic: feature-importances predictor-importance random-forest machine-learning

Category: Data Science

How to get significativity for tabular data in machine learning?

Igor Kolesnikov

2022年4月26日 11:09

Im using fastai to train a network on tabular data (https://docs.fast.ai/tutorial.tabular.html). I have a table with 5 columns, each of these is the specific attribute that describes a galaxy and helps to classify it into two types: elliptical and spiral. My question is: Is it possible to get the value of which of these attributes is most helpful/least helpful for the training? I mean some king of ranking.

Topic: feature-importances fastai classification

Category: Data Science

How to interpret importance of random forest model, Mean Decrease Accuracy and Mean Decrease Gini?

FOH

2022年4月22日 15:04

A random forest model outputs the following importance values. How do I interpert them for feature selection? If it's the mean decreased accuracy does that mean that by removing them from the model the accuracy should increase?

Topic: feature-importances random-forest

Category: Data Science

XGBoost model has features whose feature importance equal zero

leveygao

2022年4月2日 05:06

I ran into this problem: A XGBoost model(.pickle file , constrcuted under V0.7.post3) with 100 features in it ; But I found 55 features in model (model.feature_importances_) show 0 feature importance (like matrix below); Additionally, when I transformed the pickle file to PMML(to launch online), only 45 features in PMML file (those ones with importance>0 apparently); So, my question is: --why features with importance equal to 0 ending up in a XGB model ? And why they remain in the …

Topic: feature-importances xgboost

Category: Data Science

Searching machine learning algorithm for regression problem with many features

Christopher Lalk

2022年3月26日 23:20

I have a machine learning problem with about 160 features and 400 cases and I want to find the best predictors for a continuous outcome. The dataset contains variables of psychotherapists and clients. I want to predict therapy outcome. I used lasso regression in nested 20-fold cross-validation and could identify about 20 top predictors (model fit about 0.97 nrmse). (I decided not to create a seperate holdout dataset, because I have too few cases.) However, I thought I could improve …

Topic: feature-importances xgboost regression feature-selection dimensionality-reduction

Category: Data Science

Is it possible for a feature not correlated with a dependent variable to become important in a machine learning model?

NAS

2022年3月10日 10:10

Is it possible for a feature not correlated (or faintly correlated) with a dependent variable to become important in a machine learning model?

Topic: feature-importances machine-learning-model correlation statistics machine-learning

Category: Data Science

How does missing an important feature affects the feature importance of remaining features in the model?

NAS

2022年3月9日 07:05

I am creating a linear regression model for energy usage in a food processing plant. Unfortunately, I don't have the historical data for one of the critical features (I know it is important from experience). If I go ahead with the modelling excluding this feature, what will be its impact on my model performance and especially on the feature importance. Can I trust the feature importance in the absence of this feature, or would the model over attribute the importance …

Topic: feature-importances linear-regression machine-learning

Category: Data Science

Differences between Feature Importance and SHAP variable importance graph

Giampaolo Levorato

2022年3月5日 16:10

I have run an XGBClassifier using the following fields: - predictive features = ['sbp','tobacco','ldl','adiposity','obesity','alcohol','age'] - binary target = 'Target' I have produced the following Features Importance plot: I understand that, generally speaking, importance provides a score that indicates how useful or valuable each feature was in the construction of the boosted decision trees within the model. The more an attribute is used to make key decisions with decision trees, the higher its relative importance. From the list of 7 predictive …

Topic: feature-importances shap xgboost

Category: Data Science

aggregation of feature importance

dean

2022年2月18日 12:01

I have more of a conceptual question I was hoping to get some feedback on. I am trying to run a boosted regression ML model to identify a subset of important predictors for some clinical condition. The dataset includes over 100000 rows, and close to 1000 predictors. Now, the etiology of the disease we are trying to predict is largely unknown. Thus, we likely don’t have data on many important predictors for the condition. That is to say, as a …

Topic: feature-importances shap xgboost gbm cross-validation

Category: Data Science

Understanding which variables impact your variable of interest the most (correlation, linear regression) and correctly interpreting results

Learning_and_xbox

2022年2月11日 18:12

How do you ascertain which variables lead to the greatest increase in another variable of interest? Let's say you have a correlation matrix. You look at the row of the variable you are particularly curious about, retention, and see that income is the most correlated with it out of all the variables in the matrix. I would then expect when I look at the highest income cities in my dataset to see them having highest retention but am not finding …

Topic: feature-importances linear-regression correlation feature-selection

Category: Data Science

Why does an unimportant feature has a big impact on R2 in XGBoost?

volkan g

2022年1月15日 04:32

I am training an XGBoost model, xgbr, using xgb.XGBRegressor() with 13 features and one numeric target. The R2 on the test set is 0.935, which is good. I am checking the feature importance by for col,score in zip(X_train.columns,xgbr.feature_importances_): print(col,score) When I check the importance type by xgbr.importance_type, the result is gain. I have a feature, x1, whose importance seems to be 0.0068, not so high. x1 is a categorical feature with a cardinality of 5122, and I apply LabelEncoder before …

Topic: feature-importances r-squared xgboost

Category: Data Science

Feature importance by removing all other features?

Kalanos

2021年12月23日 21:12

For neural network feature importance, can I zero-out all features except one in order to gauge that feature's importance? I know shuffling a feature is one approach. For example, leaving in the 4th feature. feature_4 = [ [0.,0.,0.,1.15,0.] [0.,0.,0.,1.76,0.] [0.,0.,0.,2.31,0.] [0.,0.,0.,0.94,0.] ] _, probabilities = model.predict(feature_4) The non-linear output of activation functions worries me because activation of the whole is not equal to the sum of individual activations: from scipy.special import expit #aka sigmoid >>> expit(2.0) 0.8807970779778823 >>> expit(1.0)+expit(1.0) 1.4621171572600098 …

Topic: feature-importances interpretation activation-function deep-learning neural-network

Category: Data Science

Feature importance in binary classification

Damian Dulkawie

2021年12月6日 10:13

I am wondering if there is a way to check the feature importance for each class in a binary classification task separately. Or any way to check the correlation between features and both target classes separately?

Topic: binary-classification feature-importances random-forest feature-selection

Category: Data Science

Feature Importance interpretation

Arthur Langlois

2021年11月12日 19:38

I want to audit the results of regressions I ran, and hopefully gain more insights about a treatment effect through sklearn's feature importance function (permutation_importance), or eli5's PermutationImportance. I know that those are generally used to narrow down the number of predictors in a model, in an attempt to increase its accuracy (feature selection). My specific problem is that I do not want to use FI for feature selection, but for direct interpretation of the importance of the variables in …

Topic: feature-importances regression scikit-learn classification

Category: Data Science

Revealing the causal structure in time-dependent data

sorooshi

2021年10月29日 09:54

We have a data table that accumulates the control and monitoring parameters of the High-Temperature Superconductor (HTS) production process: such that the rows represent the observations and columns represent the parameters mentioned above. Due to the nature of the production process, there are time dependencies between the rows of our data sets. Thus the columns, are, indeed, time series. (Which boils down our data to time-dependent data.) Now the question arises: whether we can apply induced causation methods, explained in …

Topic: feature-importances cause-and-effect time-series

Category: Data Science

Assess feature importance in Keras for one-hot-encoded categorical features

Jivan

2021年10月12日 11:08

An important aspect of tuning a model is assessing feature importance. In Keras, how to assess the importance of a categorical feature which is one-hot encoded? E.g. if a categorical feature is ice_cream_colour with a cardinality of 12 then I can assess the individual importances of ice_cream_colour_blue, ice_cream_colour_red, etc, but how to do it for the entire ice_cream_colour feature? A naïve approach would be to sum all individual importances, but this assumes that the relationship between distinct feature importances is …

Topic: model-evaluations feature-importances keras neural-network

Category: Data Science

What kind of model to use to find drivers when data is aggregated and not on user level?

user126224

2021年10月8日 00:17

I have a website and have info from Google Analytics. So I can see the following "features": page url country device category (phone, desktop, etc.) Number of sessions Number of users: users who have initiated at least one session during the date range Avg. time on page Page views Bounce rate -- a probability calculated as single-page sessions divided by all sessions, or the percentage of all sessions on your site in which users viewed only a single page (e.g. …

Topic: feature-importances hypothesis-testing decision-trees random-forest

Category: Data Science

difference between feature effect and feature importance

xlra

2021年9月27日 03:03

Is there a difference between feature effect (eg SHAP effect) and feature importance in machine learning terminologies?

Topic: feature-importances shap feature-selection machine-learning

Category: Data Science

About