Prediction in CART Decision Trees
I was studying the algorithm of CART (classification and regression trees), but the formula of the prediction is irritating me.
First we have the following definition:
Let $X:={x_1,...,x_N} \subset \mathbb{R}^d $ of datapoints and $B$ the smallest box:
$$ B(X):=\{z\in \mathbb{R}^d : \min_{x\in X} x_j \leq z_j \leq \max_{x\in X} \forall j\in [d]\}$$
and let $I$ be the indicator function:
$$I[p]=1 \text{ if }p \text{ holds } 0 \text{ otherwise}$$
So let's imagine that the CART algorithm has split the datapoints resulting in a set of boxes $B_1,B_2,..., B_k$. Now our script says:
For prediction, one simply follows the decisions of the tree until one reaches a leaf box $B$. Then the predicted class is the majority of the classes in $B$. If $B_1,...,B_k$ are the boxes output at the end of the algorithm, then one computes $$ f(x) = \Sigma_{j=1}^k w_j I[x\in B_j],$$ where $$ w_j=\frac{\|\{x_i \in X \cap B_j : y_i=1\}\|}{\|X\cap B_j\|}$$ is the prediction (majority) of the class 1 in box $B_j$. Thus, $f(x)\in[0,1]$.
I cannot understand how with this formula we get $f(x)\in[0,1]$.
Here an example: Let's imagine we have a binary problem ($y\in\{0,1\}$) and we have an $x$ and as stated we follow the decision tree until we reach a leaf box $B$. In $B$ we have $|B_0|=2,|B_1|=1$ . Now we want to compute the majority vote for the prediction:
$$ f(x) = \Sigma_{j=0}^k w_j I[x\in B_j]=w_0 I[x\in B_0]+w_1 I[x\in B_1],$$
If I understand right, we have:
$$w_0=\frac{2}{3}, w_1=\frac{1}{3}$$
and
$$I[x\in B_0]=1, \text{ } I[x\in B_1]=1$$
If this is correct, then
$$f(x)=1$$ and it will always be in any case $f(x)=1$.
Am I misunderstanding something or is something wrong in the notation of the script?
Topic cart decision-trees regression classification
Category Data Science