Inspect false classified
Recently, I was able to train a simple classification algorithm (my first ML-Project) and I even got a pretty satisfying precision score.
Now I am looking for a way to inspect, which datapoints in my train_data have been falsely classified. My basic idea was something like:
If y_train != y_pred Then:
(get indices of y_train)
(look up the data in my csv and try to find a pattern)
My main problem is, that the train_test_split
function provides me with a y_test
subset like this:
print(y_test):
28886 0
23319 0
8913 1
25770 0
and y_pred
is a list like this:
print(y_pred):
[0 0 1 ... 0 1 0]
Since there already is an existing index in y_test
, I can't just compare y_test[2]
with y_pred[2]
. It seems to me that, y_test[2]
does not provide the third element of y_test
. Rather it provides the third element of my original dataset.
I am looking for a way to compare position n
of my y_test
subset with position n
of y_pred
, so I can get the index of all false classified.
The Python code I used to get this result:
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(X,Y, test_size=0.2)
clf = KNeighborsClassifier(n_neighbors=13)
clf.fit(x_train,y_train)
y_pred = clf.predict(x_test)
acc = metrics.accuracy_score(y_test,y_pred)
print(acc)
Topic beginner classification
Category Data Science