The actual results and results from pickle files are not matching in pandas for DBSCAN clustering
I've built a DBSCAN clustering model. The output result and the result after using the pickle files are not matching.
Based on HD and MC column, I am clustering WT column.
data = HD,MC
Target = WT
Below, for 1st record the cluster is 0.
But after running it from 'pkl' file, it is showing predicted result as [-1]
Dataframe:
HD MC WT Cluster
200 Other 4.5 0
150 Pep 5.6 0
100 Pla 35 -1
50 Same 15 0
Code:
le = preprocessing.LabelEncoder()
df['MC encoded'] = le.fit_transform(df['MC'])
col_1 = ['HD','MC encoded']
data = df[col_1]
col_2 = ['WT']
target = df[col_2]
data = data.fillna(value=0)
model = DBSCAN(eps=1, min_samples=20).fit(data)
outliers_df = pd.DataFrame(data)
print(Counter(model.labels_))
x = model.fit_predict(target)
print(Counter(x))
Result:
Counter({-1: 604, 0: 142, 1: 83, 9: 36, 2: 27, 7: 26, 10: 26, 8: 24, 4: 23, 5: 23, 3: 22, 11: 21, 6: 20, 12: 20, 13: 20})
Counter({0: 1093, -1: 24})
Code:
df["Cluster"] = x
filename1 = '/model.pkl'
model_df = open(filename1, 'wb')
pickle.dump(model,model_df)
model_df.close()
output = open('/MC.pkl', 'wb')
pickle.dump(le, output)
output.close()
with open('model.pkl', 'rb') as file:
pickle_model = pickle.load(file)
pkl_file = open('MC.pkl', 'rb')
le_mc = pickle.load(pkl_file)
pkl_file.close()
def testing(HD,MC,WT):
test = {'HD':[HD],'MC':[MC], 'WT':[WT]}
test = pd.DataFrame(test)
test['MC_encoded'] = le_mc.transform(test['MC'])
pred_val = pickle_model.fit_predict(test[['HD','MC_encoded']])
print(pred_val)
return(pred_val)
pred_val = testing(200,'Other',4.5)
Result:
[-1]
Topic pickle dbscan pandas python clustering
Category Data Science