Is there are way to impute missing values by clustering, regression and stochastic regression
I'd like to know if there are any libraries that allow imputation by clustering, regression and stochastic regression. So far, I've done imputation by mean, median and KNN. I'm trying to evaluate the best imputation method for an small dataset (Iris in this case). I had to delibrately create NaN values since Iris set has none.
My code for KNN imputation:
import pandas as pd
import numpy as np
import random
from fancyimpute import KNN
data = pd.read_csv("D:/Iris_classification/train.csv")
mat = data.iloc[:,:4].as_matrix()
prop = int(mat.size * 0.5) #Set the % of values to be replaced
i = [random.choice(range(mat.shape[0])) for _ in range(prop)] #Randomly choose indices of
j = [random.choice(range(mat.shape[1])) for _ in range(prop)] #the numpy array
mat[i,j] = np.NaN #replace values with NaN
mat_filled = pd.DataFrame(KNN(3).complete(mat)) #converted the array back to df
data_col = data.drop('species', axis = 1)
mat_filled.columns = data_col.columns #added column names that went missing in mat_filled
Is there a similar way to impute with the other 3 methods?
Topic data-imputation data python data-cleaning machine-learning
Category Data Science