Performing EDA on a dataset with missing features

Question

Performing EDA on a dataset with missing features

user135735

2022年5月15日 05:32

I'm new to DS.

I want to perform EDA on such dataset, where these are the missing features stats of my train and test sets:

train:

Test_0 0 Test_1 31 Test_2 0 Test_3 141 Test_4 0 Test_5 0 Test_6 0 Test_7 0 Test_8 1045 Test_9 0 Test_10 0 Test_11 0 Test_12 0 Test_13 0 Test_14 0 Test_15 2967 Class 0 dtype: int64
test:

Test_0 0 Test_1 7 Test_2 0 Test_3 46 Test_4 0 Test_5 0 Test_6 0 Test_7 0 Test_8 279 Test_9 0 Test_10 0 Test_11 0 Test_12 0 Test_13 0 Test_14 0 Test_15 738 dtype: int64

I have 3616 data lines in total on my train set and 905 on my test set. How can I decide on which features to throw away and which to fill artificially (and how to fill - I read a bit about mean filling etc.)

If anyone can also point me to a guide that explains this issues I would appreciate it.

Thanks!

Topic exploratory-factor-analysis visualization data-cleaning

Category Data Science