Inspect false classified
Recently, I was able to train a simple classification algorithm (my first ML-Project) and I even got a pretty satisfying precision score.
Now I am looking for a way to inspect, which datapoints in my train_data have been falsely classified. My basic idea was something like:
If y_train != y_pred Then:
(get indices of y_train)
(look up the data in my csv and try to find a pattern)
My main problem is, that the train_test_split function provides me with a y_test subset like this:
print(y_test):
28886 0
23319 0
8913 1
25770 0
and y_pred is a list like this:
print(y_pred):
[0 0 1 ... 0 1 0]
Since there already is an existing index in y_test, I can't just compare y_test[2] with y_pred[2]. It seems to me that, y_test[2] does not provide the third element of y_test. Rather it provides the third element of my original dataset.
I am looking for a way to compare position n of my y_test subset with position n of y_pred, so I can get the index of all false classified.
The Python code I used to get this result:
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(X,Y, test_size=0.2)
clf = KNeighborsClassifier(n_neighbors=13)
clf.fit(x_train,y_train)
y_pred = clf.predict(x_test)
acc = metrics.accuracy_score(y_test,y_pred)
print(acc)
Topic beginner classification
Category Data Science