Multi-target regression tree with additional constraint

Question

Multi-target regression tree with additional constraint

Peter

2022年5月11日 17:07

I have a regression problem where I need to predict three dependent variables ($y$) based on a set of independent variables ($x$): $$ (y_1,y_2,y_3) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_n x_n +u. $$

To solve this problem, I would prefer to use tree-based models (i.e. gradient boosting or random forest), since the independent variables ($x$) are correlated and the problem is non-linear with ex-ante unknown parameterization.

I know that I could use sklearn's MultiOutputRegressor() or RegressorChain() as a meta-estimator.

However, there is an additional twist to my problem, namely that I do know that $y_1 + y_2 - y_3 = x_1$.

In other words, there is a fixed relation between the three $y$ and one of the independent variables. So essentially, the value of $x_1$ needs to be distributed in a first best manner to the (unknow) targets $(y_1,y_2,y_3)$ for each observation, contingent on the remaining independent variables $x_2,\dots,x_n$.

Of course a naive approach would be, to squeze the predicted values together somehow, so to satisfy $\hat{y_1} + \hat{y_2} - \hat{y_3} = x_1$. However, I wonder if there are any other options to introduce a hard constraint such as $\hat{y_1} + \hat{y_2} - \hat{y_3} = x_1$ to some (tree-based) estimator.

I noticed this post. However, I would prefer a tree-based method for reasons mentionned above.

Topic gradient-boosting-decision-trees multi-output decision-trees regression

Category Data Science

Miguel Raevenswood · Accepted Answer · 2021年8月21日 14:42

So there does not seem to be anything that is out of the box ready for this but, I found an example of someone doing something similar to what you want to do with a Random Forest. Here is the link: http://astrohackweek.org/blog/multi-output-random-forests.html

Multi-target regression tree with additional constraint

About