Prediction in CART Decision Trees

I was studying the algorithm of CART (classification and regression trees), but the formula of the prediction is irritating me.

First we have the following definition:

Let $X:={x_1,...,x_N} \subset \mathbb{R}^d $ of datapoints and $B$ the smallest box: $$ B(X):=\{z\in \mathbb{R}^d : \min_{x\in X} x_j \leq z_j \leq \max_{x\in X} \forall j\in [d]\}$$ and let $I$ be the indicator function: $$I[p]=1 \text{ if }p \text{ holds } 0 \text{ otherwise}$$

So let's imagine that the CART algorithm has split the datapoints resulting in a set of boxes $B_1,B_2,..., B_k$. Now our script says:

For prediction, one simply follows the decisions of the tree until one reaches a leaf box $B$. Then the predicted class is the majority of the classes in $B$. If $B_1,...,B_k$ are the boxes output at the end of the algorithm, then one computes $$ f(x) = \Sigma_{j=1}^k w_j I[x\in B_j],$$ where $$ w_j=\frac{\|\{x_i \in X \cap B_j : y_i=1\}\|}{\|X\cap B_j\|}$$ is the prediction (majority) of the class 1 in box $B_j$. Thus, $f(x)\in[0,1]$.

I cannot understand how with this formula we get $f(x)\in[0,1]$.

Here an example: Let's imagine we have a binary problem ($y\in\{0,1\}$) and we have an $x$ and as stated we follow the decision tree until we reach a leaf box $B$. In $B$ we have $|B_0|=2,|B_1|=1$ . Now we want to compute the majority vote for the prediction: $$ f(x) = \Sigma_{j=0}^k w_j I[x\in B_j]=w_0 I[x\in B_0]+w_1 I[x\in B_1],$$ If I understand right, we have: $$w_0=\frac{2}{3}, w_1=\frac{1}{3}$$ and $$I[x\in B_0]=1, \text{ } I[x\in B_1]=1$$ If this is correct, then $$f(x)=1$$ and it will always be in any case $f(x)=1$. Am I misunderstanding something or is something wrong in the notation of the script?

Topic cart decision-trees regression classification

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.