Make a random forest estimator the exact same of a decision tree

The idea is to make one of the trees of a Random Forest, to be built exactly equal to a Decision Tree.

First, we load all libraries, fit a decision tree and plot it.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')
%matplotlib inline
import random
from pprint import pprint
import pdb
random.seed(0)
np.random.seed(0)
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier

data = load_iris()

dtc = DecisionTreeClassifier(random_state=0)
dtc.fit(data['data'].squeeze(),data.target)

tree.plot_tree(dtc)

We then do the same thing with the random forest

rf  = RandomForestClassifier(n_estimators=1,max_features=None,random_state=0)
rf.fit(data['data'].squeeze(),data.target)
tree.plot_tree(rf.estimators_[0])

My question:

Is it possible to make the exact same the first tree of the random forest and a decision tree?

Topic cart decision-trees random-forest machine-learning

Category Data Science


You need to set bootstrap=False in the random forest to disable the subsampling. (I originally commented because I expected there to be more impediments [in addition to your already-coded random_states and max_features=None], but I guess there aren't any!)

You probably don't want to do this in general; by stripping out all the randomness so that the first tree is the same as the DecisionTreeClassifier, you'll end up with all the trees being the same, and the random forest loses its usefulness.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.