Is there a random forest env (sci-kit, TFDF, R, etc) that has an implementation for multi-output regression?
It is easy to adapt the idea of tree based linear regression to perform logistic regression: The decision boundaries of the tree divide the space of independent variables into hyper-cubes, and each hyper-cube is assigned a value that serves as the output of the model. Instead of the decision boundaries and value being chosen to minimize the sum of squared residuals, it should minimize the total binary cross entropy loss (equivalent to maximizing the likelihood).
Taking this a step further, assign to each hyper-cube region a probability distribution on a fixed set of labels y_i (instead of binary pass/fail). The dependent variables are now also prob. distros on the y_i. Then minimize the total cross entropy loss, same as before.
Is there an implementation for this already?
Topic cross-entropy decision-trees logistic-regression random-forest
Category Data Science