Why do decision trees have low accuracy?

Question

Why do decision trees have low accuracy?

Nick Koprowicz

2020年4月26日 06:14

It seems to be generally acknowledged that decision trees have low prediction accuracy. Is there a concise explanation for why they have low accuracy?

I've read this so much, I've accepted it to be true, but I realize I don't have any intuition as to why it's true.

As an example, here's an excerpt from Elements of Statistical Learning (page 352):

Trees have one aspect that prevents them from being the ideal tool for predictive learning, namely inaccuracy. They seldom provide predictive ac- curacy comparable to the best that can be achieved with the data at hand.

Or on Wikipedia, under the heading Disadvantages of Decision Trees: "They are often relatively inaccurate. Many other predictors perform better with similar data. "

Topic esl machine-learning-model prediction decision-trees accuracy

Category Data Science

Carlos Mougan · Accepted Answer · 2020年4月26日 06:14

Your question is right. First a common misconception, Decision trees are deterministic and extremely greedy. A random forest is not a decision tree, it as an ensemble of decision trees selected in a way to avoid the potential pitfall of a decision tree.

If you continue reading in both of your referencees.

In wikipedia

They are often relatively inaccurate. Many other predictors perform better with similar data. This can be remedied by replacing a single decision tree with a random forest of decision trees...

Because they are greedy and deterministic if you add one row more or take one out the result can be different, also that they tend to overfit. That is my understanding of low accuracy in this sentence.

In elements of statistical learning

Trees have one aspect that prevents them from being the ideal tool for predictive learning, namely inaccuracy. They seldom provide predictive accuracy comparable to the best that can be achieved with the data at hand. As seen in Section 10.1, boosting decision trees improves their accuracy, often dramatically. A

Because they are greedy and deterministic they don't normally give their best result. That is why random forest and gradient boosting appeared and they are extremely good. They replace this pitfall of decision trees.

Also have a look at the No Free Lunch theorem.

In short your question is right, and that problem has been solved historically with random forest and gradient boosting.

fuwiak · Accepted Answer · 2020年4月7日 00:16

It's not true in general. Decision trees tends to overfit in comparison to other algorithms, which provide too low accuracy. But if you use a decision tree in the right way i.e you prepare data in the proper format, use feature selection and perform k-fold cross-validation everything should be ok.

I am sure that you misread it. There is no reason why DT could be a much worse algorithm compared to others.

Why do decision trees have low accuracy?

About