How does the construction of a decision tree differ for different optimization metrics?

I understand how a decision tree is constructed (in the ID3 algorithm) using criterion such as entropy, gini index, and variance reduction. But the formulae for these criteria do not care about optimization metrics such as accuracy, recall, AUC, kappa, f1-score, and others.

R and Python packages allow me to optimize for such metrics when I construct a decision tree. What do they do differently for each of these metrics? Where does the change happen?

Is there a pattern to how these changes are done for different classification/regression algorithms?

Topic decision-trees optimization algorithms machine-learning

Category Data Science


Splitting is the same your are optimising for the parameters.

In other words, gini, entropy etc logic will stay the same but the breadth, width, number of features etc will be different when you optimise it with different loss functions.

Accuracy, precision, F-scores etc. are evaluation metrics computed from binary outcomes and binary predictions. They are NOT (but there are modification) for them to be a loss function. For model training, you need a function that compares a continuous score (your model output) with a binary outcome - like cross-entropy. Ideally, this is calibrated such that it is minimised if the predicted mean matches the population mean (given covariates)

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.