XGBoost custom objective for regression in R
I implemented a custom objective and metric for a xgboost
regression. In order to see if I'm doing this correctly, I started with a quadratic loss. The implementation seems to work well, but I cannot reproduce the results from a standard reg:squarederror
objective.
Question:
I wonder if my current approach is correct (especially the implementation of the first and second order gradient)? If so, what could be a possible reason for the difference?
Gradient and Hessian are defined as:
grad - 2*(preds-labels)
hess - rep(2, length(labels))
Minimal example (in R):
library(ISLR)
library(xgboost)
library(tidyverse)
library(Metrics)
# Data
df = ISLR::Hitters %% select(Salary,AtBat,Hits,HmRun,Runs,RBI,Walks,Years,CAtBat,CHits,CHmRun,CRuns,CRBI,CWalks,PutOuts,Assists,Errors)
df = df[complete.cases(df),]
train = df[1:150,]
test = df[151:nrow(df),]
# XGBoost Matrix
dtrain - xgb.DMatrix(data=as.matrix(train[,-1]),label=as.matrix(train[,1]))
dtest - xgb.DMatrix(data=as.matrix(test[,-1]),label=as.matrix(test[,1]))
watchlist - list(eval = dtest)
# Custom objective function (squared error)
myobjective - function(preds, dtrain) {
labels - getinfo(dtrain, label)
grad - 2*(preds-labels)
hess - rep(2, length(labels))
return(list(grad = grad, hess = hess))
}
# Custom Metric
evalerror - function(preds, dtrain) {
labels - getinfo(dtrain, label)
u = (preds-labels)^2
err - sqrt((sum(u) / length(u)))
return(list(metric = MyError, value = err))
}
# Model Parameter
param1 - list(booster = 'gbtree'
, learning_rate = 0.1
, objective = myobjective
, eval_metric = evalerror
, set.seed = 2020)
# Train Model
xgb1 - xgb.train(params = param1
, data = dtrain
, nrounds = 500
, watchlist
, maximize = FALSE
, early_stopping_rounds = 5)
# Predict
pred1 = predict(xgb1, dtest)
mae1 = mae(test$Salary, pred1)
## XGB Model with standard loss/metric
# Model Parameter
param2 - list(booster = 'gbtree'
, learning_rate = 0.1
, objective = reg:squarederror
, set.seed = 2020)
# Train Model
xgb2 - xgb.train(params = param2
, data = dtrain
, nrounds = 500
, watchlist
, maximize = FALSE
, early_stopping_rounds = 5)
# Predict
pred2 = predict(xgb2, dtest)
mae2 = mae(test$Salary, pred2)
Results:
The custom metric yields a slightly better result
MAE=199.6
compared to the standard objectiveMAE=203.3
.During boosting, the RMSE tends to be lower with the custom objective.
For the custom objective the RMSE is:
[1] eval-MyError:599.490030
[2] eval-MyError:560.677996
[3] eval-MyError:527.867686
[4] eval-MyError:498.216760
[5] eval-MyError:472.167415
...
For the standard objective the RMSE is:
[1] eval-rmse:598.144775
[2] eval-rmse:562.479431
[3] eval-rmse:529.981079
[4] eval-rmse:501.730103
[5] eval-rmse:479.081329
Topic objective-function metric loss-function xgboost r
Category Data Science