XGBoost non-linear regression

Is it possible to use XGBoost regressor to do non-linear regressions?

I know of the objectives linear and logistic.

The linear objective works very good with the gblinear booster.

This made me wonder if it is possible to use XGBoost for non-linear regressions like logarithmic or polynomial regression.

a) Is it generally possible to make polynomial regression like in CNN where XGBoost approximates the data by generating n-polynomial function?

b) If a) is generally not possible, would it be possible to declare a curve with its parameters and let XGBoost figure out the values of the parameters? (To give an example) Assume we guess that the curve can be approximated with:

$$ 10^{a\log_{k}({x})-b} $$

XGBoost would have to figure out $a$, $k$, and $b$. $x$ would be a given feature.

Topic logarithmic xgboost regression

Category Data Science


Boosting is just a special way to fit some model by trying to successively/repeatedly "explain" the residual. See a minimal example for a linear booster here. So essentially the xgboost model with gblinear will be a "normal" linear model.

From your question I would not expect that a linear booster delivers good results against the backdrop of your problem. I think if you want to use other models than NN, you have several options.

  • Use boosting with "tree based" (gbtree). This will fit a model which is essentially "non-parametric". However, the success of this strategy will depend on the explanatory power of your "x" variables (which you did not mention in the question).
  • Use linear-style models with more general structure (i.e. generalised additiove models, GAM). These model family is extremely well suited to fit highly non-linear functions. Find a minimal example here. There are GAM for Python and R. My minimal example would yield the following result (see figure). The blue line is a "normal" linear model, the black line is a fitted GAM model (red is the ground truth).
  • If you know the parameterization of your model (more or less), you could also define a linear model (with proper parameterization) to solve your model. However, this seems to be a less attractive solution. It can be daunting to find a proper representation for the data.

Introduction to Statistical Learning (ISL) provides a good overview of GAM models if you want to have a further look. There are also Python examples for ISL.

enter image description here

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.