Comparison of classifier confusion matrices
I tried implementing Logistic regression, Linear Discriminant Analysis and KNN for the smarket dataset provided in An Introduction to Statistical Learning in python.
Logistic Regression and LDA was pretty straight forward in terms of implementation. Here are the confusion matrices on a test dataset.
Both of them are pretty similar with almost same accuracy. But I tried finding a K for KNN by plotting the loss vs K graph:
and chose a K around 125 to get this confusion matrix (same test dataset)
Although the KNN gave a higher accuracy of around 0.61, the confusion matrix is very different from logistic and LDA matrices with a much higher true negative and a low true positive. I cant really understand why this is happening. Any help would be appreciated.
Here is how I computed loss for KNN Classifier (using Sklearn). Could not use MSE since the Y values are qualitative.
k_set = np.linspace(1,200, dtype=int)
knn_dict = {}
for k in k_set:
model = KNeighborsClassifier(k)
model.fit(train_X, train_Y)
y_pred = model.predict(test_X)
loss = 1 - metrics.accuracy_score(test_Y, y_pred)
knn_dict[k] = loss
model = KNeighborsClassifier(K)
model.fit(train_X, train_Y)
knn_y_pred = model.predict(test_X)
knn_cnf_matrix = metrics.confusion_matrix(test_Y, knn_y_pred)
Very new to data science. I hope I have provided enough background/context. Let me know if more info is needed.
Topic k-nn lda-classifier logistic-regression confusion-matrix python
Category Data Science