Upper bound on size of sample set for decision trees
Say I have an instance space with 4 features and I know that a decision tree with 8 nodes can represent the target function I want to learn. I want to give an upper bound on the size of the sample set needed in order to achieve a true error of at most x%.
I found this theorem in a text book.
Let $\mathcal{H}$ be an hypothesis class and let $\epsilon, \delta 0$. If a training set $S$ of size
$$n\geq\frac{1}{\epsilon}\operatorname{ln}\left(\frac{|\mathcal{H}|}{\delta}\right)$$
is drawn from distribution $\mathcal{D}$, then with probability greater than or equal to $1-\delta$ every $h \in \mathcal{H}$ with true error $err_D(h) \geq \epsilon$ has training error $err_S(h) 0$. Equivalently, with probability greater than or equal to $1-\delta$, every $h \in \mathcal{H}$ with training error zero has true error less than $\epsilon$.
My question is, can I use this theorem to give such an upper bound? If yes, what would the size of my hypothesis space $|\mathcal{H}|$ be in this case? If not, how can I give this upper bound?
Topic generalization decision-trees
Category Data Science