Handling missing values in medical data

I have a medical dataset that contains maternal and foetal data during pregnancy. There are some missing values in the dataset that I am unsure how to handle.

Here is a short example of my dataset:

id    insulin    ultrasound_AC
0     33         2651 
1                2743
2     29  

Patient 0 was prescribed insulin at 33 weeks gestation, patient 2 at 29 weeks. Whereas patient 1 was not prescribed insulin, hence the missing value. Similarly, patient 0's foetus had an ultrasound abdominal circumference measurement of 2651, patient 1 had a measurement of 2743, whereas patient 2 has a missing value for this feature, probably due to not attending this ultrasound appointment.

I am wondering how to handle these missing values.

In the case of the insulin feature, imputing missing values here would be incorrect, as the patients with missing values were never prescribed insulin. I could use SimpleImputer to fill all missing values with zeros, or would this be interpreted by a ML model as though the patient was prescribed insulin from the start of pregnancy?

As for the ultrasound abdominal circumference measurement, I could impute missing values using some imputation method, such as KNN Imputation, but with the data being medical, I am unsure if this is the best method as I do not want to modify the data too much.

Please advise!

Topic data-imputation missing-data

Category Data Science


In your case missing value simply implies that insulin was not given. To separate and handle it i think we can do missing value imputation to 0. I dont think any problem with that.This will work for insulin.

For Ultrasound AC i think you can do two things, you can use KNN to impute values.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.