Multiple models have extreme differences during evaluation

Question

Multiple models have extreme differences during evaluation

Egor

2021年10月9日 12:04

My dataset has about 100k entries, 6 features, and the label is simple binary classification (about 65% zeros, 35% ones).

When I train my dataset on different models: random forest, decision tree, extra trees, k-nearest neighbors, logistic regression, sgd, dense neural networks, etc, the evaluations differ GREATLY from model to model.

tree classifiers: about 80% for both accuracy and precision
k-nearest neighbors: 56% accuracy and 36% precision.
linear svm: 65% accuracy and 0 positives guessed
sgd : 63% accuracy and 2 true positives + 4 false positives

I don't understand the difference in such disparity. Can someone explain why that happens? Am I doing something wrong?

Also cannot find an answer to my question, so please link if someone asked it already

Would really appreciate the help!

Topic sgd decision-trees evaluation accuracy machine-learning

Category Data Science

Brian Spiering · Accepted Answer · 2021年10月9日 12:04

1

Brian Spiering answered at 2021年10月9日 12:04

One way to compare models is to look at the different decision boundaries the different models have learned. The different decision boundaries can impact the evaluation metrics.

Erwan · Accepted Answer · 2021年10月8日 02:12

A few thoughts:

The first thing I would check is whether the other models overfit. You could check this by comparing the performance between the training set and the test set.
Also there's something a bit strange about k-NN always predicting the majority class. This would happen only if any instance is always closer to more majority instances than minority instances. In this case there's something wrong with either the features or the distance measure.
100k instances looks like a large dataset but with only 6 features it's possible that the data contains many duplicates and/or near-duplicates which don't bring any information for the model. In general it's possible that the features are simply not good indicators, although in this case the decision tree models would fail as well.
The better performance of the tree models points to something discontinuous in the features (btw you didn't mention if they are numerical or categorical?). Decision trees and especially random forests can handle discontinuity but like logistic regression might have trouble with it.

Multiple models have extreme differences during evaluation

About