IterativeImputer Evaluation

Question

IterativeImputer Evaluation

candy bird

2022年5月31日 09:53

I am having a hard time evaluating my model of imputation.

I used an iterative imputer model to fill in the missing values in all four columns.

For the model on the iterative imputer, I am using a Random forest model, here is my code for imputing:

imp_mean = IterativeImputer(estimator=RandomForestRegressor(), random_state=0)
imp_mean.fit(my_data)
my_data_filled=  pd.DataFrame(imp_mean.transform(my_data))
my_data_filled.head()

My problem is how can I evaluate my model. How can I know if the filled values are right?

I used a describe function before and after filling in the missing values it gives me nearly the same mean and std. Also, the correlation between variables stayed nearly the same with slight changes.

Topic wikipedia evaluation scikit-learn pandas python

Category Data Science

lanenok · Accepted Answer · 2022年5月31日 09:53

I do agree that it is important "not to modify" the actual distribution. The KS test in the answer of @Multivac is intended for 1-dimensional (1D) distribution. What is even more important is to keep the multidimensional distributions intact.

For example, your data include age of the person and the level of education. If you check only the 1D distribution, it is still possible that you get 16-year-old with Master degree or PhD.

So, IMHO, it is much more important to check mutual dependencies and multidimensional distributions.

Multivac · Accepted Answer · 2022年5月30日 13:06

When imputing data, one is looking not to modify the actual distribution of your data. So a way to test how good your imputation was is to make a test to contrast the true distribution of every feature that has been imputed vs the true (via KS test, for example) distribution of the feature (prior imputing) if you can sate with a level. of confidence that your imputation preserved the distribution that would be a way.

Another way would be in case you have a supervised task, you can compare the performance of your model on each imputation technique. Like in the below's image from Scikit-learn documentation:

IterativeImputer Evaluation

About