XGBoost custom objective for regression in R

I implemented a custom objective and metric for a xgboost regression. In order to see if I'm doing this correctly, I started with a quadratic loss. The implementation seems to work well, but I cannot reproduce the results from a standard reg:squarederror objective.

Question:

I wonder if my current approach is correct (especially the implementation of the first and second order gradient)? If so, what could be a possible reason for the difference?

Gradient and Hessian are defined as:

grad - 2*(preds-labels) 
hess - rep(2, length(labels))

Minimal example (in R):

library(ISLR)
library(xgboost)
library(tidyverse)
library(Metrics)

# Data
df = ISLR::Hitters %% select(Salary,AtBat,Hits,HmRun,Runs,RBI,Walks,Years,CAtBat,CHits,CHmRun,CRuns,CRBI,CWalks,PutOuts,Assists,Errors)
df = df[complete.cases(df),]
train = df[1:150,]
test = df[151:nrow(df),]

# XGBoost Matrix
dtrain - xgb.DMatrix(data=as.matrix(train[,-1]),label=as.matrix(train[,1]))
dtest - xgb.DMatrix(data=as.matrix(test[,-1]),label=as.matrix(test[,1]))
watchlist - list(eval = dtest)

# Custom objective function (squared error)
myobjective - function(preds, dtrain) {
  labels - getinfo(dtrain, label)
  grad - 2*(preds-labels)
  hess - rep(2, length(labels))
  return(list(grad = grad, hess = hess))
}

# Custom Metric
evalerror - function(preds, dtrain) {
  labels - getinfo(dtrain, label)
  u = (preds-labels)^2
  err - sqrt((sum(u) / length(u)))
  return(list(metric = MyError, value = err))
}

# Model Parameter
param1 - list(booster = 'gbtree'
              , learning_rate = 0.1
              , objective = myobjective 
              , eval_metric = evalerror
              , set.seed = 2020)

# Train Model
xgb1 - xgb.train(params = param1
                 , data = dtrain
                 , nrounds = 500
                 , watchlist
                 , maximize = FALSE
                 , early_stopping_rounds = 5)

# Predict
pred1 = predict(xgb1, dtest)
mae1 = mae(test$Salary, pred1)


## XGB Model with standard loss/metric
# Model Parameter
param2 - list(booster = 'gbtree'
              , learning_rate = 0.1
              , objective = reg:squarederror
              , set.seed = 2020)

# Train Model
xgb2 - xgb.train(params = param2
                 , data = dtrain
                 , nrounds = 500
                 , watchlist
                 , maximize = FALSE
                 , early_stopping_rounds = 5)

# Predict
pred2 = predict(xgb2, dtest)
mae2 = mae(test$Salary, pred2)

Results:

  • The custom metric yields a slightly better result MAE=199.6 compared to the standard objective MAE=203.3.

  • During boosting, the RMSE tends to be lower with the custom objective.

For the custom objective the RMSE is:

[1] eval-MyError:599.490030 
[2] eval-MyError:560.677996 
[3] eval-MyError:527.867686
[4] eval-MyError:498.216760 
[5] eval-MyError:472.167415 
...

For the standard objective the RMSE is:

[1] eval-rmse:598.144775 
[2] eval-rmse:562.479431 
[3] eval-rmse:529.981079 
[4] eval-rmse:501.730103 
[5] eval-rmse:479.081329 

Topic objective-function metric loss-function xgboost r

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.