How to improve the result? Should I remove the columns?
I am using this dataset, the target column is the last one which is 'DEATH_EVENT', I have separated this last one. I am using KMeans to calculate the number of hits and misses. The result is quite bad, I think I should delete some columns or create a loop that deletes. What would you do?
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
X = np.genfromtxt('heart_failure_clinical_records_dataset.csv', delimiter=',')
X = np.delete(X, 0, 0)
train, test = train_test_split(X, test_size=0.33, shuffle=True, random_state=100)
X_train = np.delete(train, -1, axis =1)
y_train = train[:, -1]
X_test = test[:, :-1]
y_test = test[:, -1]
from sklearn.cluster import KMeans
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
K = 2
kmeans = KMeans(n_clusters=K)
pred = kmeans.predict(X_test)
n_items = len(pred)
aciertos = 0
for i in range(0, n_items):
aciertos += 1 if (pred[i] == y_test[i]) else 0
print(Hitss: %6.5f, misses %6.5f % (aciertos/n_items, (n_items-aciertos)/n_items))
cm = confusion_matrix(y_test, pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm)
Hits: 0.59596, misses 0.40404
Topic deep-learning python k-means
Category Data Science