Model Tree M5 - Robustness to Data Quality Issues

Question

Model Tree M5 - Robustness to Data Quality Issues

chrisper

2021年5月20日 10:20

I am currently investigating the M5 tree algorithm by Quinlan(1992) link here: https://sci2s.ugr.es/keel/pdf/algorithm/congreso/1992-Quinlan-AI.pdf

An example of a linear regression model of the algorithm can be seen below:

An implementation of the model similar to Scikitlearn can be found here: https://github.com/ankonzoid/LearningX/tree/master/advanced_ML/model_tree

The M5 model is a more advanced implementation of the standard decision trees such as the IDE3 or C4.5. Instead of simple binary splits of the training features the data is split to the Standard Deviation Reduction calculated as follows against linear functions at leaf nodes:

At the end the model produces composite linear regression sections for the different split portions of the model at the leaf nodes.

Would the same robustness principles hold for outliers, missing values and noise as for regression trees?

Topic noise missing-data decision-trees outlier

Category Data Science

Model Tree M5 - Robustness to Data Quality Issues

About