Random forest mode scoring

Question

Random forest mode scoring

Amit Raz

2021年5月23日 17:00

We are using random forest algorithm but having some trouble understanding the scoring method it uses.

take for example the following CM of the test set:

Threshold 45 cm is: 
[[67969 48031]
 [ 3321 11120]] and the prescion is: 0.18799344051632602
Threshold 50 cm is: 
[[77642 38358]
 [ 4785  9656]] and the prescion is: 0.2011080101632834
Threshold 55 cm is: 
[[88825 27175]
 [ 6796  7645]] and the prescion is: 0.2195577254445159
Threshold 60 cm is: 
[[100411  15589]
 [  9629   4812]] and the prescion is: 0.2358707906463611
Threshold 65 cm is: 
[[112421   3579]
 [ 13098   1343]] and the prescion is: 0.2728565623674755
Threshold 70 cm is: 
[[115895    105]
 [ 14371     70]] and the prescion is: 0.3999999997714286
Threshold 75 cm is: 
[[115998      2]
 [ 14440      1]] and the prescion is: 0.3333333222222226
Threshold 80 cm is: 
[[116000      0]
 [ 14441      0]] and the prescion is: 0.0
Threshold 85 cm is: 
[[116000      0]
 [ 14441      0]] and the prescion is: 0.0
Threshold 90 cm is: 
[[116000      0]
 [ 14441      0]] and the prescion is: 0.0

This is how we used the RF and printed it's score:

grid_clf = RandomizedSearchCV(clf, param_grid, cv=tscv, verbose=10,n_iter=20,n_jobs=-1,scoring='roc_auc')
grid_clf.fit(X_train, y_train)
print(grid_clf.score(X_test,y_test))

The score we got for this model is 0.7350173458471928

As far as I understand the scoring when using roc_auc is between 0.5 and 1.

How can such a bad model received such a good score?

How is this scoring calculated?

Provided we predicted enough True Positives, we don't mind missing 1's and predicting False Positives. we of course do mind predicting True Negatives

Can I change the scoring to fit what I believe are better results?

Thanks

Topic scoring decision-trees random-forest

Category Data Science

Brian Spiering · Accepted Answer · 2021年5月23日 16:59

The evaluation metric used for classification .score is accuracy.

Your model has higher accuracy and lower precision. Looking at your confusion matrix, you have class imbalances which can lead to that result.

You should not change your evaluation metric to get better results. It is better to pick the evaluation that makes the most sense for the problem. Then improve the model through collecting more data, engineering better features, changing algorithms, or tuning hyperparameters.

Random forest mode scoring

About