xgboost

Prediction issue with xgboost custom loss

phil

2022年6月4日 21:00

I have an issue with xgboost custom objectives: I do not manage to get consistent forecasts. In other words, the scale of my forecasts is not in line with the values I would like to predict. I tried many custom loss, but I always get the same issue. import numpy as np import pandas as pd import xgboost as xgb from sklearn.datasets import make_regression n_samples_train = 500 n_samples_test = 100 n_features = 200 X, y = make_regression(n_samples_train, n_features,noise=10) X_test, y_test …

Topic: prediction xgboost machine-learning

Category: Data Science

How do you do 1-vs-rest classifiers in XGBoost Library (Not Sklearn)?

Sebastian

2022年6月4日 18:02

I am working with a very large dataset that would benefit from using training continuation with the xgb_model parameter in xgb.train(). The label (Y) of dataset itself has 4 classes and is highly imbalanced, so I would like to generate per-label PR curves for it to evaluate its performance, and would thus need to treat each class as it's own binary problem using a one-vs-rest classifier. After a lot of reading I haven't found an equivalent to sklearn's OneVsRestClassifier in …

Topic: xgboost multiclass-classification bigdata machine-learning

Category: Data Science

Is this XGBoost model tending to overfit?

Suvrodip Mukhopadhyay

2022年6月4日 17:38

Here is the list of hyperparameters that I used: params = { 'scale_pos_weight': [1.0], 'eta': [0.05, 0.1, 0.15, 0.9, 1.0], 'max_depth': [1, 2, 6, 10, 15, 20], 'gamma': [0.0, 0.4, 0.5, 0.7] } The dataset is imbalanced so I used scale_pos_weight parameter. After 5 fold cross validation the f1 score that I got is: 0.530726530426833

Topic: hyperparameter-tuning overfitting xgboost hyperparameter dataset

Category: Data Science

What should be the target vairable in CTR maximization problem?

p0712

2022年6月4日 16:11

I have a dataset that contains some user-specific detials like gender, age-range, region etc. and also the behavioural data which contains the historical click-through-rate (last 3 months) for different ad-types shown to them. Sample of the data is shown below. It has 3 ad-types i.e. ecommerce, auto, healthcare but the actual data contains more ad-types. I need to build a regression model using XGBRegressor that can tell which ad should be shown to a given new user in order to …

Topic: xgboost regression predictive-modeling

Category: Data Science

what is the strange stuff at top left of plot_convergence

Mathieu Krisztian

2022年6月3日 14:16

I make a gp_minimize with 20 calls : the correct values of the functions are the blue bottom markers. What is the strange markers distributions at the top left ? optimize_result=gp_minimize(func_objective,space_xgboost,n_calls=n_calls,n_initial_points=n_initial_points,x0=x0_init,random_state=1) ax=plot_convergence(optimize_result,true_minimum=optimize_result.fun) ax.figure.tight_layout() ax.figure.savefig(f"results/convergence_hpo_skopt.png")

Topic: xgboost

Category: Data Science

GridSearch multiplying the number of trees in XGboost?

Cosapocha

2022年6月2日 16:43

I'm having an issue: after running an XGboost in a HalvingGridSearchCV, I receive a certain number of estimators (50 for example), but the number of trees is constantly being multiplied by 3. I don't understand why. Here is the code: model = XGBClassifier(objective='multi:softprob', subsample = 0.9, colsample_bytree=0.5, num_class= 3) md = [3, 6, 10, 15] lr = [0.1, 0.5, 1] g = [0, 0.25, 1] rl = [0, 1, 10] spw = [1, 3, 5] ns = [5, 10, 20] …

Topic: gradient-boosting-decision-trees xgboost decision-trees classification machine-learning

Category: Data Science

How to make XGBOOST capture trend in time series forecasting?

zoete aardappel

2022年6月2日 15:21

I am trying to forecast some sales data with monthly values, I have been trying some classical models as well ML models like XGBOOST. My data with a feature set looks like this with a length of 110 months and I am trying to forecast for next 12 months, When it comes to XGBOOST, I've been spending time mostly on hyperparameter optimization with Gridsearch and also state-of-art packages like optuna. My currently best set of parameters looks like this, parameters …

Topic: forecasting xgboost optimization time-series machine-learning

Category: Data Science

Improving prediction accuracy with XGBoost

ZJAY

2022年6月2日 05:00

I have a 32x20 matrix for which I am trying to use XGBoost (Regression). I am looping through rows to produce an out of sample forecast. I'm surprised that XGBoost only returns an out of sample error (MAPE) of 3-4%. When I run the data through other algorithms (glmboost, boosted linear model), I get MAPEs around 1.8-2.5%. I'm surprised XGBoost is so deficient. I suspect I am under-optimizing hyperparameters. I include a gridsearch, which I ran below, but the error …

Topic: xgboost scikit-learn

Category: Data Science

Why is XGBClassifier in Python outputting different feature importance values with the same data across different repetitions?

user15733888

2022年5月31日 09:01

I am fitting an XGBClassifier to a small dataset (32 subjects) and find that if I loop through the code 10 times the feature importances (gain) assigned to the features in the model varies slightly. I am using the same hyperparameter values between each iteration, and have subsample and colsample set to the default of 1 to prevent any random variation between executions. I am using the scikit learn feature_importance_ function to extract the values from the fitted model. Any …

Topic: feature-importances xgboost feature-selection python machine-learning

Category: Data Science

XGBClassifier's predictions are not probabilities with objective='binary:logistic'

João Bravo

2022年5月30日 18:55

I am using a XGBoost's XGBClassifier, a binary 0-1 target, and I am trying to define a custom metric function. It supposedly receives an array of predictions and a DMatrix with the training set according to the XGBoost Tutorials. I have used objective='binary:logistic' in order to get probabilities but the prediction values passed to the custom metric function are not between 0 and 1. They can be like between -3 and 5 and the range of values seems to grow …

Topic: metric probability xgboost scikit-learn classification

Category: Data Science

Using Transaction Amount to Guide Learning in an Fraud Detection Machine Learning Model

Charles

2022年5月29日 21:03

I am currently using transaction amount as a feature in an XGBoost classification model designed to identify fraudulent transactions. Furthermore, transaction amount is bounded for this problem between 0 and 500. Using transaction amount as a feature does improve target class separability. However, I can't help but wonder if there is a better way to use this variable. To explain, I care more about getting the high transaction amount values correct than I do the low ones. However, the model …

Topic: loss-function xgboost hyperparameter

Category: Data Science

Multiple XGBoost models or just 1 for a cetain type of category?

user113156

2022年5月28日 21:03

I am building a model to predict, say house prices. Within my data I have sales and rentals. The Y variable is the price of either the sales or rentals. I also have a number of X variables to predict Y, such as number of bedrooms, bathrooms, meters squared etc. I believe that the model will firstly make a split on the variable "sales" vs "rentals" as this would reduce the loss function - RMSE - the most. Do you …

Topic: xgboost predictive-modeling machine-learning

Category: Data Science

Ignoring features in XGBoost by setting them as "missing"

Alexandru Dinu

2022年5月27日 09:22

I have some data n x m and I want to ignore certain features. One idea I had is to mark those features as "missing", since XGBoost can handle missing values by default, e.g. using nan when constructing the DMatrix: n, m = 100, 10 X = np.random.uniform(size=(n, m)) y = (np.sum(X, axis=1) >= 0.5 * m).astype(int) # ignore certain features: mark them as missing X[:, 2:7] = np.nan dtrain = xgb.DMatrix(X, label=y, missing=np.nan) model = xgb.train(params={'objective': 'binary:logistic'}, dtrain=dtrain) My …

Topic: feature-engineering xgboost

Category: Data Science

Interpretation of SHAP summary plot in a multi class context

hideonbush

2022年5月26日 14:54

I'm performing multi-class classification and uses SHAP values to interpret the features. I have 3 classes. I have testet XGBoost and Multinomial Logistic Regression. When i'm using XGBoost I am able to get a summary plot where I can see the individual feature affect on all three classes. I'm also able to get a seperate plot for each class to see how small/large feature values affect the prediction towards the individual class. It seems like this is only possible to …

Topic: shap xgboost python

Category: Data Science

Is multicollinarity a problem when interpreting SHAP values from an XGBoost model?

hideonbush

2022年5月25日 11:56

I'm using an XGBoost model for multi-class classification and is looking at feature importance by using SHAP values. I'm curious if multicollinarity is a problem for the interpretation of the SHAP values? As far as I know, XGB is not affected by multicollinarity, so I assume SHAP won't be affected due to that?

Topic: shap explainable-ai xgboost machine-learning

Category: Data Science

Distribution of predicted probability is heavily skewed

Crist2002

2022年5月24日 21:11

I'm a beginner here. I'm just trying to use a xgboost method for classification learning problem. My data is 70-30 unbalanced. But I ran into a problem about the distribution of predicted probability is heavily skewed as a picture below. I need an advice to solve this one.

Topic: xgboost

Category: Data Science

Imbalanced classification

yassine sfayhi

2022年5月21日 07:14

I've tried all kind of oversampling undersampling techniques and I've tried also weighted Xgboost ( the model I'm trying to improve) I couldn't surpass a very Bad F1 score : 0.09 What should I do

Topic: imbalanced-learn smote xgboost random-forest machine-learning

Category: Data Science

Make fitted xgboost or linear regression model predicts values in thé future

Djakarta_zero

2022年5月20日 16:25

I have a machine learning model that I fitted with xgboost and linear regression. My dataset has thirteen features and has price as the target. Is there any way to make the model predict values in the future? I have date time as one of the variables. From searching on internet, I learned about fb prophet, and that this is a time series problem. But if my xgboost is doing well, is there a way to make it predict future …

Topic: forecasting xgboost linear-regression time-series machine-learning

Category: Data Science

Correct theoretical regularized objective function for XGB/LGBM (regression task)

Manu675

2022年5月20日 04:03

I am writing an academic paper on the application of Machine Learning methods to Time Series Forecasting and I am unsure about how to write down the theoretical part about the regularized objective function for XGBoosting. Below you can find the equation given by the developers of the XGB algorithm for the regularized objective function (equation 2). The paper is called "XGBoost: A Scalable Tree Boosting System" by Chen & Guestrin (2016). In the Python API from the xgb library …

Topic: lightgbm regularization xgboost

Category: Data Science

XGBOOST with target column has categorical data and features also has categorical data

Utkarsh Goyal

2022年5月19日 08:01

I have a huge dataset with the categorical columns in features and also my target variable is categorical. All the values are not ordinal so I think it is best to use one hot encoding. But I have one issue that my target variable have 90 classes so if I do one hot encoding there will be 90 columns as the target columns and it will become to much complex. But as all the values are not ordinal can I …

Topic: one-hot-encoding xgboost categorical-data

Category: Data Science

About