Linear models: Imputing missing not at random

Question

Linear models: Imputing missing not at random

thereandhere1

2021年6月4日 05:08

This question is a continuation of a similar question for linear models instead of Tree-based model.

Given that linear models (e.g. lasso, ridge, Linear regression, elastic net, etc.) can't handle missing NaN values and are sensitive to feature scale, what are appropriate approaches to encode or impute missing not at random values in independent features.

For example, if I have the following two independent features in my model:

CAR_OWNER: Binary features (TRUE/FALSE or 0/1) w/o missing values
CAR_COLOR: BLUE, GREEN, NaN (Here, missing NaN values indicated that CAR_OWNER is False)

What is the appropriate symbolic value/imputation for the missing values in CAR_COLOR which will not impact the model?

Topic linear-models lasso data-imputation linear-regression logistic-regression

Category Data Science

Linear models: Imputing missing not at random

About