Customising Objective Function in R

I wondered if there were R packages that allow the user to customise the loss function?

For example, if a random forest package like ranger had a loss function which minimises OOB MSE. Is it somehow possible to customise this into a model which minimises negative log likelihood?

Would appreciate if someone knew of examples/code of doing this on a toy dataset

Topic r machine-learning

Category Data Science


Using XGBoost it is relatively easy to invoke a custom loss function. There are also quite a lot of already implemented options.

It would look something like:

library(ISLR)
library(xgboost)
library(tidyverse)
library(Metrics)

# Data
df = ISLR::Hitters %>% select(Salary,AtBat,Hits,HmRun,Runs,RBI,Walks,Years,CAtBat,CHits,CHmRun,CRuns,CRBI,CWalks,PutOuts,Assists,Errors)
df = df[complete.cases(df),]
train = df[1:150,]
test = df[151:nrow(df),]

# XGBoost Matrix
dtrain <- xgb.DMatrix(data=as.matrix(train[,-1]),label=as.matrix(train[,1]))
dtest <- xgb.DMatrix(data=as.matrix(test[,-1]),label=as.matrix(test[,1]))
watchlist <- list(eval = dtest)

# Custom objective function (Huber)
# Reference for gradients: https://stackoverflow.com/questions/45006341/xgboost-how-to-use-mae-as-objective-function

myobjective <- function(preds, dtrain) {
  labels <- getinfo(dtrain, "label")
  d = preds - labels
  h = 5
  scale = 1 + (d / h)^2
  scale_sqrt = sqrt(scale)
  grad = d / scale_sqrt
  hess = 1 / scale / scale_sqrt
  return(list(grad = grad, hess = hess))
}

# Custom Metric
evalerror <- function(preds, dtrain) {
  labels <- getinfo(dtrain, "label")
  u = (preds-labels)^2
  err <- (sum(u) / length(u))^(1/2)
  return(list(metric = "MyError", value = err))
}

# Model Parameter
param <- list(booster = 'gbtree'
               , learning_rate = 0.1
               , objective = myobjective 
               , eval_metric = evalerror
               , set.seed = 2020)

# Train Model
xgb <- xgb.train(params = param
                  , data = dtrain
                  , nrounds = 500
                  , watchlist
                  , maximize = FALSE
                  , early_stopping_rounds = 5
                  ,verbose=1)

# Predict
pred = predict(xgb, dtest)
mae = mae(test$Salary, pred)
print(mae)

I have done this before. Not with ranger. Packages that are open source, and with the appropriate license, I have changed the code for my model. The code might be in R, C, Python, Java.

I have also used xgboost for this. xgboost allow you to customize the objective function. You create the gradient and the hessian based on your function. You can call xgboost from R. The example is in Python. I am not sure if they support an R customized loss function but writing one function in Python might not be hard for your team.

I have not done this but you can use Tensorflow in R and write a customized loss function.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.