Is there are way to impute missing values by clustering, regression and stochastic regression

I'd like to know if there are any libraries that allow imputation by clustering, regression and stochastic regression. So far, I've done imputation by mean, median and KNN. I'm trying to evaluate the best imputation method for an small dataset (Iris in this case). I had to delibrately create NaN values since Iris set has none.

My code for KNN imputation:

 import pandas as pd
 import numpy as np
 import random
 from fancyimpute import KNN

 data = pd.read_csv("D:/Iris_classification/train.csv")
 mat = data.iloc[:,:4].as_matrix()

 prop = int(mat.size * 0.5) #Set the % of values to be replaced
 i = [random.choice(range(mat.shape[0])) for _ in range(prop)] #Randomly choose indices of 
 j = [random.choice(range(mat.shape[1])) for _ in range(prop)] #the numpy array 

 mat[i,j] = np.NaN #replace values with NaN



 mat_filled = pd.DataFrame(KNN(3).complete(mat)) #converted the array back to df

 data_col = data.drop('species', axis = 1)
 mat_filled.columns = data_col.columns  #added column names that went missing in mat_filled

Is there a similar way to impute with the other 3 methods?

Topic data-imputation data python data-cleaning machine-learning

Category Data Science


Scikit-learn has an impute module that supports many of those imputation methods.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.