Is there are way to impute missing values by clustering, regression and stochastic regression

Question

Is there are way to impute missing values by clustering, regression and stochastic regression

uharsha33

2022年4月24日 18:03

I'd like to know if there are any libraries that allow imputation by clustering, regression and stochastic regression. So far, I've done imputation by mean, median and KNN. I'm trying to evaluate the best imputation method for an small dataset (Iris in this case). I had to delibrately create NaN values since Iris set has none.

My code for KNN imputation:

 import pandas as pd
 import numpy as np
 import random
 from fancyimpute import KNN

 data = pd.read_csv("D:/Iris_classification/train.csv")
 mat = data.iloc[:,:4].as_matrix()

 prop = int(mat.size * 0.5) #Set the % of values to be replaced
 i = [random.choice(range(mat.shape[0])) for _ in range(prop)] #Randomly choose indices of 
 j = [random.choice(range(mat.shape[1])) for _ in range(prop)] #the numpy array 

 mat[i,j] = np.NaN #replace values with NaN



 mat_filled = pd.DataFrame(KNN(3).complete(mat)) #converted the array back to df

 data_col = data.drop('species', axis = 1)
 mat_filled.columns = data_col.columns  #added column names that went missing in mat_filled

Is there a similar way to impute with the other 3 methods?

Topic data-imputation data python data-cleaning machine-learning

Category Data Science

Brian Spiering · Accepted Answer · 2021年7月18日 21:26

1

Brian Spiering answered at 2021年7月18日 21:26

Scikit-learn has an impute module that supports many of those imputation methods.

Is there are way to impute missing values by clustering, regression and stochastic regression

About