How to deal with feature with different sample size?
I got a dataset that contains 50 features starting from 2009 to 2018. But one of the feature was only availiable since 2015 and unable to recover. I am concerning about if I train a model on the whole dataset, the estimated coefficient of that sparse feature will be biased (since the feature is not spare, just all the data from 2009-2014 is not availiable)
Therefore, I would like to ask how would you guys deal with feature that was not availiable in half of the dataset.
Thank you!
Topic feature-engineering missing-data feature-selection
Category Data Science