Evaluating optimal values for depth of tree

I'm studying the performance of an AdaBoost model and I wonder how it performs in regard to the depth of the trees.

Here's the accuracy for the model with a depth of 1

and here with a depth of 3

From my point of view, I would say the lower one looks better but somehow I guess the upper one is better as the training accuracy doesn't vanish (overfitting?)? The question resp. answer from Hyperparameter tunning for Random Forest- choose the best max depth underlines my assumption, though.

Topic adaboost accuracy

Category Data Science


The training error shouldn't be too far from test error, otherwise it is a high deviance scenario and you could be in an overfitting situation in production.

However, having a higher deviance could be normal by increasing depth, but it shouldn't happen if you have enough data.

Consequently, if you haven't a lot of data, the depth of 1 seems better, and you should increase the training iterations to lower the error.

In addition to that, there is just a small difference in test results between the depth of 1 and the depth of 3. So, the small benefit of the depth of 3 doesn't worth the risk of having a high deviance scenario. But maybe max depth of 2 is better than 1...

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.