Difference Between Performance Scores

I need some help to understand the meaning between these different scores. Currently, I am doing the classification problems using machine learning, and I have obtained the results for the classification as shown in the image below.

To obtain the results like in the image I use the code:

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
print(confusion_matrix(y_test,y_pred))
print(accuracy_score(y_test,y_pred))
print(classification_report(y_test, y_pred))

Then I also try to use the code below to get the recall, precision and f1-score

print(precision_score(y_test,y_pred))
print(recall_score(y_test,y_pred))
print(f1_score(y_test,y_pred))

Results : 
Precision : 0.19601
Recall : 0.44360
F1-score : 0.27188

Then I also try this code for weighted using this code:

print(precision_score(y_test,y_pred, average='weighted'))
print(recall_score(y_test,y_pred, average='weighted'))
print(f1_score(y_test,y_pred, average='weighted'))

Results : 
Weighted Precision : 0.8588
Weighted Recall : 0.7684
Weighted F1-score : 0.8048

The problem right now is I already confuse with all these values. What does the meaning of avg/total value in the image, the values of the second code I try, and the value of the weighted metrics as my third code. Which value that I should use to know that the performance of the classifier is good or not ? Hope someone can help me.

Topic performance machine-learning

Category Data Science


What you are obtaining are different metrics on the predictions you made with a given model (so far I haven't given you any new information).

What you printed are several results from different metrics which measure different things in a model.

As you should know, sometimes the data scientist (you in this case) need to know how the model performs well in positive cases, or how does it perform in negative cases, etc.

The first thing you printed is called the confusion matrix, it crosses the information on Positives/Negatives in the real-world vs your model.

In the first square: When you said positive and it's positive in your data. The last square is negative and you said negative and (depending on how your output is made) the 2nd and 3rd squares are false positives and false negatives.

The precision measures how well your model detects Positives when your model says they are positive, meaning: How your model says the truth about positives $Precision = \frac{True Positives}{True Positives+False Positives}$

The recall measures how well your model detects Negatives when your model says they are Negatives, meaning: How your model says the truth about negatives $Recall = \frac{True Negatives}{True Negatives+False Negatives}$

The f1-score gives you the harmonic mean of precision and recall. The scores corresponding to every class will tell you the accuracy of the classifier in classifying the data points in that particular class compared to all other classes.

The support is the number of samples of the true response that lie in that class.

You can find documentation on both measures in the sklearn documentation.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.