Given M binary variables and R samples, what is the maximum number of leaves in a decision tree?

Given M binary variables and R samples, what is the maximum number of leaves in a decision tree?

My first assumption was that the worst case would be a leaf for each sample, thus R leaves maximum. Am I wrong and there should be a kind of connection with the number of variables M? I know that the maximum depth of a decision tree is M as a variable can appear once in a branch, but I don't see the relation with the number of leaves.

Thanks in advance!

Topic theory decision-trees

Category Data Science


The maximum possible combinations with M binary variable is

$$ 2^M $$ so essentially if all these values have different classes, then the number of leaves should be equal to

$$ if \ R<2^M => R \\ else \ 2^M $$

A tree which has all possible leaves for M binary variables could at max contain 2^M combinations, think each leaf with written value of a possible combination eg: 0000,0001,0002,...,1110,1111, and these can come only once, because only 1 label can be associated with each leaf

In case a row has multiple labels, for same set of input, the max number of leaves would be equal to unique input combinations in R

    A   B   label
0   1   0   0
1   0   0   1
2   1   1   2
3   1   1   1

The number of leaves in this case would be 3 and not 4 (number of inputs)

enter image description here

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.