The actual results and results from pickle files are not matching in pandas for DBSCAN clustering

Question

The actual results and results from pickle files are not matching in pandas for DBSCAN clustering

anagha s

2022年3月26日 22:06

I've built a DBSCAN clustering model. The output result and the result after using the pickle files are not matching.

Based on HD and MC column, I am clustering WT column.

data = HD,MC
Target = WT

Below, for 1st record the cluster is 0.

But after running it from 'pkl' file, it is showing predicted result as [-1]

Dataframe:

      HD         MC             WT         Cluster
      200        Other          4.5        0
      150        Pep            5.6        0
      100        Pla            35         -1
      50         Same           15         0

Code:

 le = preprocessing.LabelEncoder()
 df['MC encoded'] = le.fit_transform(df['MC'])

 col_1 = ['HD','MC encoded']
 data = df[col_1]
 col_2 = ['WT']
 target = df[col_2]
 data = data.fillna(value=0)


 model = DBSCAN(eps=1, min_samples=20).fit(data)
 outliers_df = pd.DataFrame(data)
 print(Counter(model.labels_))

 x = model.fit_predict(target)
 print(Counter(x))

Result:

  Counter({-1: 604, 0: 142, 1: 83, 9: 36, 2: 27, 7: 26, 10: 26, 8: 24, 4: 23, 5: 23, 3: 22, 11: 21, 6: 20, 12: 20, 13: 20})
  Counter({0: 1093, -1: 24})

Code:

  df["Cluster"] = x

  filename1 = '/model.pkl'
  model_df = open(filename1, 'wb')
  pickle.dump(model,model_df)
  model_df.close()

  output = open('/MC.pkl', 'wb')
  pickle.dump(le, output)
  output.close()

  with open('model.pkl', 'rb') as file:  
     pickle_model = pickle.load(file)


  pkl_file = open('MC.pkl', 'rb')
  le_mc = pickle.load(pkl_file) 
  pkl_file.close()


 def testing(HD,MC,WT):
     test = {'HD':[HD],'MC':[MC], 'WT':[WT]} 
     test = pd.DataFrame(test)
     test['MC_encoded'] = le_mc.transform(test['MC'])
     pred_val = pickle_model.fit_predict(test[['HD','MC_encoded']])
     print(pred_val)
     return(pred_val)



      pred_val = testing(200,'Other',4.5)

Result:

    [-1]

Topic pickle dbscan pandas python clustering

Category Data Science

Blenz · Accepted Answer · 2019年10月10日 15:12

Without looking at anything else :

pred_val = pickle_model.fit_predict(test[['HD','MC_encoded']])

You're training your pickle_model on your test_data by using fit_predict() method. Start by replacing it with .predict() directly so can use the model as it is and not train it on a single sample.

Yaakov Bressler · Accepted Answer · 2019年9月10日 13:37

1

Yaakov Bressler answered at 2019年9月10日 13:37

It appears your pickle file isn't being loaded as a pandas dataframe. Why not just use df_pickle = pd.read_pickle('/MC.pkl') – the rest should fall into place after.

The actual results and results from pickle files are not matching in pandas for DBSCAN clustering

About