Why is XGBClassifier in Python outputting different feature importance values with the same data across different repetitions?

I am fitting an XGBClassifier to a small dataset (32 subjects) and find that if I loop through the code 10 times the feature importances (gain) assigned to the features in the model varies slightly. I am using the same hyperparameter values between each iteration, and have subsample and colsample set to the default of 1 to prevent any random variation between executions. I am using the scikit learn feature_importance_ function to extract the values from the fitted model. Any …
Category: Data Science

Feature importance of a linear regression

What is the easiest and easy to explain feature importance calculation for linear regression? I know I can use Shap to compute feature importance, but I find it difficult to explain it to stakeholders, and the coefficient is not a good measure of feature importance since it depends on the scale of the feature. Some suggested (standard deviation of feature)*feature coefficient as a good measure of feature importance.
Category: Data Science

Interpreting the variance of feature importance outputs with each random forest run using the same parameters

I noticed that I am getting different feature importance results with each random forest run even though they are using the same parameters. Now, I know that a random forest model takes observations randomly which is causing the importance levels to vary. This is especially shown for the less important variables. My question is how does one interpret the variance in random forest results when running it multiple times? I know that one can reduce the instability level of results …
Category: Data Science

How to get significativity for tabular data in machine learning?

Im using fastai to train a network on tabular data (https://docs.fast.ai/tutorial.tabular.html). I have a table with 5 columns, each of these is the specific attribute that describes a galaxy and helps to classify it into two types: elliptical and spiral. My question is: Is it possible to get the value of which of these attributes is most helpful/least helpful for the training? I mean some king of ranking.
Category: Data Science

XGBoost model has features whose feature importance equal zero

I ran into this problem: A XGBoost model(.pickle file , constrcuted under V0.7.post3) with 100 features in it ; But I found 55 features in model (model.feature_importances_) show 0 feature importance (like matrix below); Additionally, when I transformed the pickle file to PMML(to launch online), only 45 features in PMML file (those ones with importance>0 apparently); So, my question is: --why features with importance equal to 0 ending up in a XGB model ? And why they remain in the …
Category: Data Science

Searching machine learning algorithm for regression problem with many features

I have a machine learning problem with about 160 features and 400 cases and I want to find the best predictors for a continuous outcome. The dataset contains variables of psychotherapists and clients. I want to predict therapy outcome. I used lasso regression in nested 20-fold cross-validation and could identify about 20 top predictors (model fit about 0.97 nrmse). (I decided not to create a seperate holdout dataset, because I have too few cases.) However, I thought I could improve …
Category: Data Science

How does missing an important feature affects the feature importance of remaining features in the model?

I am creating a linear regression model for energy usage in a food processing plant. Unfortunately, I don't have the historical data for one of the critical features (I know it is important from experience). If I go ahead with the modelling excluding this feature, what will be its impact on my model performance and especially on the feature importance. Can I trust the feature importance in the absence of this feature, or would the model over attribute the importance …
Category: Data Science

Differences between Feature Importance and SHAP variable importance graph

I have run an XGBClassifier using the following fields: - predictive features = ['sbp','tobacco','ldl','adiposity','obesity','alcohol','age'] - binary target = 'Target' I have produced the following Features Importance plot: I understand that, generally speaking, importance provides a score that indicates how useful or valuable each feature was in the construction of the boosted decision trees within the model. The more an attribute is used to make key decisions with decision trees, the higher its relative importance. From the list of 7 predictive …
Category: Data Science

aggregation of feature importance

I have more of a conceptual question I was hoping to get some feedback on. I am trying to run a boosted regression ML model to identify a subset of important predictors for some clinical condition. The dataset includes over 100000 rows, and close to 1000 predictors. Now, the etiology of the disease we are trying to predict is largely unknown. Thus, we likely don’t have data on many important predictors for the condition. That is to say, as a …
Category: Data Science

Understanding which variables impact your variable of interest the most (correlation, linear regression) and correctly interpreting results

How do you ascertain which variables lead to the greatest increase in another variable of interest? Let's say you have a correlation matrix. You look at the row of the variable you are particularly curious about, retention, and see that income is the most correlated with it out of all the variables in the matrix. I would then expect when I look at the highest income cities in my dataset to see them having highest retention but am not finding …
Category: Data Science

Why does an unimportant feature has a big impact on R2 in XGBoost?

I am training an XGBoost model, xgbr, using xgb.XGBRegressor() with 13 features and one numeric target. The R2 on the test set is 0.935, which is good. I am checking the feature importance by for col,score in zip(X_train.columns,xgbr.feature_importances_): print(col,score) When I check the importance type by xgbr.importance_type, the result is gain. I have a feature, x1, whose importance seems to be 0.0068, not so high. x1 is a categorical feature with a cardinality of 5122, and I apply LabelEncoder before …
Category: Data Science

Feature importance by removing all other features?

For neural network feature importance, can I zero-out all features except one in order to gauge that feature's importance? I know shuffling a feature is one approach. For example, leaving in the 4th feature. feature_4 = [ [0.,0.,0.,1.15,0.] [0.,0.,0.,1.76,0.] [0.,0.,0.,2.31,0.] [0.,0.,0.,0.94,0.] ] _, probabilities = model.predict(feature_4) The non-linear output of activation functions worries me because activation of the whole is not equal to the sum of individual activations: from scipy.special import expit #aka sigmoid >>> expit(2.0) 0.8807970779778823 >>> expit(1.0)+expit(1.0) 1.4621171572600098 …
Category: Data Science

Feature Importance interpretation

I want to audit the results of regressions I ran, and hopefully gain more insights about a treatment effect through sklearn's feature importance function (permutation_importance), or eli5's PermutationImportance. I know that those are generally used to narrow down the number of predictors in a model, in an attempt to increase its accuracy (feature selection). My specific problem is that I do not want to use FI for feature selection, but for direct interpretation of the importance of the variables in …
Category: Data Science

Revealing the causal structure in time-dependent data

We have a data table that accumulates the control and monitoring parameters of the High-Temperature Superconductor (HTS) production process: such that the rows represent the observations and columns represent the parameters mentioned above. Due to the nature of the production process, there are time dependencies between the rows of our data sets. Thus the columns, are, indeed, time series. (Which boils down our data to time-dependent data.) Now the question arises: whether we can apply induced causation methods, explained in …
Category: Data Science

Assess feature importance in Keras for one-hot-encoded categorical features

An important aspect of tuning a model is assessing feature importance. In Keras, how to assess the importance of a categorical feature which is one-hot encoded? E.g. if a categorical feature is ice_cream_colour with a cardinality of 12 then I can assess the individual importances of ice_cream_colour_blue, ice_cream_colour_red, etc, but how to do it for the entire ice_cream_colour feature? A naïve approach would be to sum all individual importances, but this assumes that the relationship between distinct feature importances is …
Category: Data Science

What kind of model to use to find drivers when data is aggregated and not on user level?

I have a website and have info from Google Analytics. So I can see the following "features": page url country device category (phone, desktop, etc.) Number of sessions Number of users: users who have initiated at least one session during the date range Avg. time on page Page views Bounce rate -- a probability calculated as single-page sessions divided by all sessions, or the percentage of all sessions on your site in which users viewed only a single page (e.g. …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.