Increasing minNumObj increasing accuracy in decision tree

I have been using a J48 classifier in weka and have noticed that increasing minNumObj -- The minimum number of instances per leaf leads to a small accuracy increase.

-M      Result.     Size    Num Leaves 
2       73.8281 %   39      20
3       74.2188 %   39      20
4       74.4792 %   37      19
5       74.6094 %   25      13
6       74.2188 %   23      12
7       74.2188 %   23      12
8       74.349  %   23      12
9       75.2604 %   29      15
10      75.5208 %   29      15
11      75%         23      12
12      76.3021 %   23      12

However in several examples such as :

http://ww.samdrazin.com/classes/een548/project2report.pdf

The opposite its shown , which minNumObj increasing lowering the accuracy. The confidence factor was held constant at 1.0 to minimize post-pruning. Cross validation folds for the testing set (crossValidationFolds) was held at 10.

My results were made with a confidence factor of 2.5 but the difference between 2.5 and 1 is minimal.

https://pdfs.semanticscholar.org/9984/a7d06e04a347718cb8c7f645b72195bb11ce.pdf

See section 5.3. Measuring Performance: Precision and Recall

Why is the accuracy in my data going up and not down with an increase of minnumobj?

Topic weka data-mining machine-learning

Category Data Science


Your results are more in line with my experiences. Increasing the minimum number of observations per leaf decreases the capacity of the model, increasing bias and decreasing variance. Tree models are often overfitted without some kind of complexity reduction, so this usually is a beneficial tradeoff in net performance.

I'd speculate that the papers' results are different because of their data's small size: 683 and 4601... that second one doesn't strike me as too small, but...

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.