Random Forest but keep only leaves with impurities below a threshold
Is there an algorithm out there that creates a random forest but then prunes all the leaves that have an impurity measure above a certain threshold that I would determine?
In other words, if I set min samples per leaf to be 500 and leaves have to have at least a 90% purity for example, the algorithm would only keep leaves that respect these parameters.
My dataset is extremely noisy so most leaves have a gini impurity around 0.5 but some leaves are almost around 0. I care only for the latter in my use case. Is there an algorithm that does something like what I described?
Topic lightgbm xgboost gbm random-forest machine-learning
Category Data Science