How to find the feature regions where each label is the most expected when using decision trees?

Question

How to find the feature regions where each label is the most expected when using decision trees?

lalaland

2020年10月18日 13:26

Given a decision tree for classification for example this one:

What is the way to find the feature domain (petal and sepal width and length) where a sample would most likely occur in the feature space for each class?

It is clear here that for Setosa it is when petal length is less or equal to 2.45.

However, where I am confused is how to think in more complex cases. For example, let's take Versicolor:

I am hesitating between 2 choices or take every path that leads to Versicolor or just choose the domain (considering the path) that leads to the leaf with the most samples.

I don't necessarily care about this example, I want to know the general case and how to think to solve that problem.

Thanks

Topic multilabel-classification expectation-maximization decision-trees classification feature-selection

Category Data Science

lcrmorin · Accepted Answer · 2020年10月18日 13:26

It seems that you want to achieve something like this :

Where you can see the instances, classes and the predicted and the cutoffs for the rules. The exemple is taken from : https://jakevdp.github.io/PythonDataScienceHandbook/05.08-random-forests.html You might want to find one that is interactive (plotly ?) so you can get the rules that interest you by hoovering your mouse above the graph.

Note that this appraoch has some problems :

It only work with two variables at a time. You migth need to plot similar graphs for all your couple of features.
It only work for simple classification trees. It might start to get more difficult to interpret the plots and the rules if your output is continuous.

How to find the feature regions where each label is the most expected when using decision trees?

About