Find the best interpolation method for missing observations

I have a database which has measurements of objects every day every hour. However, some data is missing and I don't have measurements for all the hours. in order to get over this challenge I have used different interpolations methods in order to create this missing data (with pandas). So now I have several databases with those interpolations methods, and I need only one.

My question is how can I determine which interpolation is the best interpolation method?

I have researched the internet but mainly found explanation abut how to interpolate data but not how to choose the best method and how can I visualize it.

Topic interpolation pandas python

Category Data Science


The most basic method that springs to mind is split of a test set:

Take the data where you have recorded all variables which you might need to extrapolate in another set, and split of a percentage of that and "mask" or hide the variable you wish to interpolate in this split (maybe using the data from the other part of the split if you're using some sort of trained interpolation).

Compare the results of the different interpolation methods you are using with the actual values (that you've taken out) on a metric that suits your purpose for the data best (e.g. mean squared error, mean absolute error, logistic loss, or maybe even the outcome of some machine learning method trained on the dataset).

That way, you'll find the interpolation method that best suits your data + problem.

One thing to keep in mind is that your masking should follow the same (if any) patterns that your actual missing data has: e.g. if it only happens on certain time periods, your masking method should try to follow that pattern if possible.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.