How to deal with nan values after merging / joining two dataframes?
A lot of time after merging two pandas dataframes, I end up with NaNs in the new dataframe, that's just how the way it is, because one csv does not have all the ID's that the other has (Two dataframes of different sizes for example). Those NaNs have not been present before, it's just the nature of the left join in pandas to specify that missing data as NaN. So some rows have NaN values in some columns. My question is how to deal with those values from a data science point of view ? Should I remove them ? Should I replace them ? What to do if you cannot replace them with mean or median ? What are the best practices for this ? Am I even doing the right thing by merging the two dataframes if I end up with missing values ? Should the missing values resulting from merging two dataframes be dealt with as normal missing data ?
Topic data-engineering pandas dataset data-cleaning
Category Data Science