Negative R2_score Bad predictions for my Sales prediction problem using LightGBM
My project involves trying to predict the sales quantity for a specific item across a whole year. I've used the LightGBM package for making the predictions. The params I've set for it are as follows:
params = {
'nthread': 10,
'max_depth': 5, #DONE
'task': 'train',
'boosting_type': 'gbdt',
'objective': 'regression_l1',
'metric': 'mape', # this is abs(a-e)/max(1,a)
'num_leaves': 2, #DONE
'learning_rate': 0.2180, #DONE
'feature_fraction': 0.9, #DONE
'bagging_fraction': 0.990, #DONE
'bagging_freq': 1, #DONE
'lambda_l1': 3.097758978478437, #DONE
'lambda_l2': 2.9482537987198496, #DONE
'verbose': 1,
'min_child_weight': 6.996211413900573,
'min_split_gain': 0.037310344962162616,
'min_data_in_bin': 1, #DONE
'min_data_in_leaf':2, #DONE
'num_boost_round': 1, #DONE
'max_bin': 7, #DONE
'extra_trees': True, #DONE
My dataset consists of daily sales data (columns= date, quantity) for the years 2017, 2018, 2019 and 3 months of 2020. I've been trying to use the 2017 and 2018 data for training and cross-validation and trying to test it for 2019 data. However my predictions for the year is way off the mark while considering the quantities on a weekly, monthly, quarterly or yearly basis (error ~ 40-50%)(I've tuned the params to bring the error down to this values). Moreover while considering the predictions, my r2_score is giving me a negative value of around -2.9148426301633803
. Any suggestions on what can be done to make it better?
Script for lightgbm:
lgb_train = lgb.Dataset(train_x, train_y)
lgb_valid = lgb.Dataset(test_x, test_y)
model = lgb.train(params, lgb_train, \
valid_sets=[lgb_train, lgb_valid],\
test_df_pred = df[( = '2019-01-01') ( '2020-01-01')]
#test_df_pred = df[( = '2019-01-01') ( '2019-02-01')]
#test_df_pred = df[( = '2019-01-15') ( '2019-01-22')]
test_df_pred['month'] = test_df_pred['date'].dt.month
test_df_pred['day'] = test_df_pred['date'].dt.dayofweek
test_df_pred['year'] = test_df_pred['date'].dt.year
col = [i for i in test_df_pred.columns if i not in ['date','id', 'qty']]
y_test_pred = model.predict(test_df_pred[col])
test_df_pred['qty_pred'] = y_test_pred
mse = mean_squared_error(y_true=test_df_pred['qty'], y_pred=test_df_pred['qty_pred'])
mae = mean_absolute_error(y_true=test_df_pred['qty'], y_pred=test_df_pred['qty_pred'])
mape = mean_absolute_percentage_error(y_true=test_df_pred['qty'], y_pred=test_df_pred['qty_pred'])
qty = test_df_pred.qty.sum()
qty_pred = test_df_pred.qty_pred.sum()
diff = qty_pred - qty
Topic lightgbm xgboost time-series python predictive-modeling
Category Data Science