Inspect false classified

Recently, I was able to train a simple classification algorithm (my first ML-Project) and I even got a pretty satisfying precision score.

Now I am looking for a way to inspect, which datapoints in my train_data have been falsely classified. My basic idea was something like:

If y_train != y_pred Then:
     (get indices of y_train)
     (look up the data in my csv and try to find a pattern)

My main problem is, that the train_test_split function provides me with a y_test subset like this:

print(y_test):

    28886    0
    23319    0
    8913     1
    25770    0

and y_pred is a list like this:

print(y_pred):

    [0 0 1 ... 0 1 0]

Since there already is an existing index in y_test, I can't just compare y_test[2] with y_pred[2]. It seems to me that, y_test[2] does not provide the third element of y_test. Rather it provides the third element of my original dataset.

I am looking for a way to compare position n of my y_test subset with position n of y_pred, so I can get the index of all false classified.

The Python code I used to get this result:

x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(X,Y, test_size=0.2)    

clf = KNeighborsClassifier(n_neighbors=13)
clf.fit(x_train,y_train)

y_pred = clf.predict(x_test)
acc = metrics.accuracy_score(y_test,y_pred)
print(acc)

Topic beginner classification

Category Data Science


You can convert series to the list and do the comparison, the order must have remained same any way.

or you if you want to keep the reference you can convert list to series using

pd.Series(list, index= y_test.index)

Thank you for your answer. I actually just found a way to do it:

y_test_indexes = y_test.index


for i in range(0,len(y_test_indexes)):
    value = y_test_indexes[i]
    if y_test[value] != y_pred[i]:
        print(value,y_test[value],y_pred[i])

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.