How to compare between two methods of multivariate to filling NA

In the Titanic dataset, I performed two methods to fill Age NA. The first one is regression using Lasso:

from sklearn.linear_model import Lasso
AgefillnaModel=Lasso(copy_X=False)
AgefillnaModel_X.dropna(inplace=True)
y=DF.Age.dropna(inplace=False)
AgefillnaModel.fit(AgefillnaModel_X,y)
DF.loc[ageNaIn,'Age']=AgefillnaModel.predict(DF.loc[ageNaIn,AgefillnaModel_X.columns])

and the second method is using IterativeImputer() from scikit-learn.impute.

from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
# Setting the random_state argument for reproducibility
imputer = IterativeImputer(random_state=42)
imputed = imputer.fit_transform(DF)
df_imputed = pd.DataFrame(imputed, columns=DF.columns)
round(df_imputed, 2)

Now, how can I decide which one is better?

Here is the result of scattered Age vs Sex:

Topic lasso data-imputation missing-data scikit-learn

Category Data Science


You don't at this stage. Train a few models with each method and compare.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.