lightgbm

Why does Light GBM model produce different results while testing?

HEMANTHKUMAR GADI

2022年6月4日 06:08

Using the Light GBM regressor, I have trained my data and, using Grid Search, I got the best parameters, but while testing with the best parameters I am getting different results each time, which means the model produces different results for each test iteration. I ran the lightgbm twice with the same parameters, but got different results in validation. I found the only random seed parameter to be baggingSeed. After fixing baggingSeed, the problem also occurred. Should I fix any …

Topic: lightgbm grid-search training gbm machine-learning

Category: Data Science

Does lightGBM handle multicollinearity?

As13

2022年6月2日 09:22

I have a dataset after feature selection of around 6500 features and 10,000 data rows. I am using LightGBM model. I want to know if I should check the feature set for multicollinearity. If two or more features are correlated how does it affect the tree building and classification prediction How does LightGBM deal with multicollinearity? Does it have any adverse effects?

Topic: lightgbm boosting correlation machine-learning

Category: Data Science

Optimizing MAE degrades MAE metrics

Mark531

2022年6月1日 15:01

I have run a lighgbm regression model by optimizing on RMSE and measuring the performance on RMSE: model = LGBMRegressor(objective="regression", n_estimators=500, n_jobs=8) model.fit(X_train, y_train, eval_metric="rmse", eval_set=[(X_train, y_train), (X_test, y_test)], early_stopping_rounds=20) The model keeps improving during the 500 iterations. Here are the performances I obtain on MAE: MAE on train : 1.080571 MAE on test : 1.258383 But the metric I'm really interested in is MAE, so I decided to optimize it directly (and choose it as the evaluation metric): model …

Topic: lightgbm objective-function

Category: Data Science

splitting point in LightGBM?

As13

2022年5月27日 08:53

I am not able to understand how the first root node is selected in LightGBM and how the splitting at nodes happens further. I read blogs and related documents and I understand that in this histogram-based splitting happens. But it is not clear after the bins are made what is the decision on the basis of which split happens. How is the best split decided? Please elaborate on this.

Topic: gradient-boosting-decision-trees lightgbm machine-learning

Category: Data Science

Correct theoretical regularized objective function for XGB/LGBM (regression task)

Manu675

2022年5月20日 04:03

I am writing an academic paper on the application of Machine Learning methods to Time Series Forecasting and I am unsure about how to write down the theoretical part about the regularized objective function for XGBoosting. Below you can find the equation given by the developers of the XGB algorithm for the regularized objective function (equation 2). The paper is called "XGBoost: A Scalable Tree Boosting System" by Chen & Guestrin (2016). In the Python API from the xgb library …

Topic: lightgbm regularization xgboost

Category: Data Science

Model Performance on external validation Set really low?

As13

2022年5月9日 17:09

I am using the LGBM model for binary classification. My train and test accuracies are 87% & 82% respectively with cross-validation of 89%. ROC-AUC score of 81%. But when evaluating model performance on an external validation test that has not been seen before, the model gives a roc-auc of 41%. Can somebody suggest what should be done?

Topic: lightgbm validation classification machine-learning

Category: Data Science

Is my model overfitting ? Training Acc :93 % test accuracy 82%

As13

2022年5月6日 10:04

I am using LGBM model for binary classification. After hyper-parameter tuning I get Training accuracy 0.9340 Test accuracy 0.8213 can I say my model is overfitting? Or is it acceptable in the industry? Also to add to this when I increase the num_leaves for the same model,I am able to achieve: Train Accuracy : 0.8675 test accuracy : 0.8137 Which one of these results are acceptable and can be reported?

Topic: lightgbm overfitting machine-learning

Category: Data Science

LGBM model predicting only single class on unseen data!

As13

2022年5月6日 07:08

I have built a LightGBM based machine learning model on data of molecules of two classes. The distribution is as follows. Class 0 has 5933 data points and class 1 has 4696. The train test accuracy I get on this data is around 87% and 82% respectively. The roc_auc_score is around 81.5%. But when I try to evaluate model performance on an entirely new dataset which model has never seen before with class label 0 and 1 both having 94 …

Topic: binary-classification lightgbm generalization predictive-modeling machine-learning

Category: Data Science

Model Dump Parser (like XGBFI) for LightGBM and CatBoost

bradS

2022年4月20日 23:01

Currently my employer has multiple GLM in a live environment. I am interested in identifying new features and interactions to enhance the accuracy of these GLM; for now I am limited to the GLM structure so simply deploying a solution which automatically accounts for interactions is not possible. I have in the past used XGBoost to identify powerful feature interactions through the use of XGBFI / XGBFIR. I am now looking in to using LightGBM and CatBoost to do the …

Topic: catboost lightgbm xgboost python

Category: Data Science

LightGBM predict_proba in thousandths place

Tinkinc

2022年4月18日 19:14

Can someone explain to me how my lightgbm classification model's predict_proba() is in thousandths place for the positive class: prob_test = model.predict_proba(X_test) print(prob_test[:,1]) array([0.00219813, 0.00170795, 0.00125507, ..., 0.00248431, 0.00150855, 0.00185903]) Is this common/how is this calculated? Should there be concern on performance testing(AUC)? FYI: data is highly imbalanced train = 0.0017 ratio

Topic: lightgbm scikit-learn classification python

Category: Data Science

Negative R2_score Bad predictions for my Sales prediction problem using LightGBM

Gopik Anand

2022年4月18日 05:05

My project involves trying to predict the sales quantity for a specific item across a whole year. I've used the LightGBM package for making the predictions. The params I've set for it are as follows: params = { 'nthread': 10, 'max_depth': 5, #DONE 'task': 'train', 'boosting_type': 'gbdt', 'objective': 'regression_l1', 'metric': 'mape', # this is abs(a-e)/max(1,a) 'num_leaves': 2, #DONE 'learning_rate': 0.2180, #DONE 'feature_fraction': 0.9, #DONE 'bagging_fraction': 0.990, #DONE 'bagging_freq': 1, #DONE 'lambda_l1': 3.097758978478437, #DONE 'lambda_l2': 2.9482537987198496, #DONE 'verbose': 1, 'min_child_weight': 6.996211413900573, …

Topic: lightgbm xgboost time-series python predictive-modeling

Category: Data Science

Random LightGBM Forest

CutePoison

2022年4月13日 14:06

I'm not completly sure about the bias/variance of boosted decision trees (LightGBM especially), thus I wonder if we generally would expect a performance boost by creating an ensemble of multiple LightGBM models, just like with Random Forest?

Topic: bagging gradient-boosting-decision-trees lightgbm ensemble-modeling

Category: Data Science

Example for Boosting

Chris

2022年4月10日 15:00

Can someone exactly tell me how does boosting as implemented by LightGBM or XGBoost work in real case scenerio. Like I know it splits tree leaf wise instead of level wise, which will contribute to global average not just the loss of branch which will help it learn lower error rate faster than level wise tree. But I cannot understand completely until I see some real example, I have tried to look at so many articles and videos but everywhere …

Topic: gradient-boosting-decision-trees lightgbm boosting xgboost machine-learning

Category: Data Science

How to specify scale_pos_weight value at runtime in Hyperopt?

megjosh

2022年3月25日 06:06

I want to use LighgbmClassifier for a binary Classification. for Hyper Parameter tuning I want to use Hyperopt. The Dataset is imbalanced. Using Sklearns class_weight.compute_class_weight as shown below clas_wts_arr = class_weight.compute_class_weight('balanced',np.unique(y_trn),y_trn) self.scale_pos_wt = clas_wts_arr[0] / clas_wts_arr[1] The following is the space parameter that I am passing to the objective function space = {'objective' : hp.choice('objective', objective_list), 'boosting' : hp.choice('boosting', boosting_list), 'metric' : hp.choice('metric', metric_list), "max_depth": hp.quniform("max_depth", 1, 15,2), 'min_data_in_leaf': hp.quniform('min_data_in_leaf', 1, 256, 1), 'num_leaves': hp.quniform('num_leaves', 7, 150, 1), 'feature_fraction' : …

Topic: lightgbm

Category: Data Science

Proof of GOSS algorithm in lightGBM paper

HannesZ

2022年3月4日 08:19

In the LightGBM paper the authors make use of a newly developed sampling method GOSS to reduce the number of data instances needed for finding the best split of a given feature in a tree-node. They give an error estimation for the error made by sampling instead of taking the entire data (Theorem 3.2 in https://www.microsoft.com/en-us/research/wp-content/uploads/2017/11/lightgbm.pdf) I am interested in the proof of this Theorem for which the paper refers to "supplementary materials" Where can I find those?

Topic: lightgbm theory

Category: Data Science

Sliding window approach using SVR & LightGBM

dragan zrilic

2022年2月14日 02:01

I'm working on a multivariate time series forecast using a couple of ML algorithms (Neural Networks, Support Vector Machines & Gradient boosting algorithms). I need to measure the performance of each model. I've implemented the 1st model using Tensorflow 2.0. Training & testing data was created using tf.Dataset API. The data format is (window_data, forecast), where window_data represents a set of 24 timesteps and forecast represents the next timestep. Now I need to train 2nd & 3rd model using SVR …

Topic: lightgbm forecast neural-network svm time-series

Category: Data Science

Incorporating data over time into lightgbm

user1777900

2022年2月7日 22:10

So I'm in the situation where I know what it is I'm trying to find, but not the terminology for it and I think that's why a lot of my google searches are directing me in the wrong direction, so apologies if some of this explanation ends up redundant. Essentially, I want to be able to incorporate historical trends into the lightgbm model I've been using. Basically I have a bunch of categorical health data currently but by default, currently …

Topic: lightgbm time-series python

Category: Data Science

How to make LightGBM to suppress output?

Peter

2022年1月24日 11:49

I have tried for a while to figure out how to "shut up" LightGBM. Especially, I would like to suppress the output of LightGBM during training (i.e. feedback on the boosting steps). My model: params = { 'objective': 'regression', 'learning_rate' :0.9, 'max_depth' : 1, 'metric': 'mean_squared_error', 'seed': 7, 'boosting_type' : 'gbdt' } gbm = lgb.train(params, lgb_train, num_boost_round=100000, valid_sets=lgb_eval, early_stopping_rounds=100) I tried to add verbose=0 as suggested in the docs, but this does not work. https://github.com/microsoft/LightGBM/blob/master/docs/Parameters.rst Does anyone know how to …

Topic: lightgbm boosting python

Category: Data Science

LightGBM eval_set - what to do when I fit the final model (there's no test data left)

Lewis Morris

2022年1月16日 06:53

I'm using LightGBM's eval_set feature when fitting my model. This enables early stopping on the number of estimators used. callbacks = [lgb.early_stopping(80, verbose=0), lgb.log_evaluation(period=0)] fit_params = {"callbacks":callbacks, "eval_metric" : "auc", "eval_set" : [(x_train,y_train), (x_test,y_test)], "eval_names" : ['train', 'valid']} lg = LGBMClassifier(n_estimators=5000, verbose=-1,objective="binary", **{"scale_pos_weight":train_weight, "metric":"auc"})#"binary_logloss"}) This works great when doing cross validation and early stopping is triggered. But when I have finally selected a model, and want to train it on the full data set. I have no test data left …

Topic: lightgbm

Category: Data Science

Understanding feature_parallel distributed learning algorithm in LightGBMClassifier

MiloMinderbinder

2022年1月14日 16:58

I want to understand feature_parallel algorithm in LightGBMClassifier. It describes how it is done traditionally and how LightGBM aims to improve it The two ways are as follows (verbatim from linked site): Traditional Feature_parallel: Feature parallel aims to parallelize the “Find Best Split” in the decision tree. The procedure of traditional feature parallel is: Partition data vertically (different machines have different feature set). Workers find local best split point {feature, threshold} on local feature set. Communicate local best splits with …

Topic: gradient-boosting-decision-trees lightgbm decision-trees

Category: Data Science

About