Solving Feature Distribution variance between Training and Prediction for Ranking models

I am building a linear regression model to improve ranking of documents. And trying to identify problems due to which model performance estimates don't match actual impact

One major problem is feature distribution for training and prediction.

Eg:

For training data, features distribution is computed on top of seen documents.

But during online prediction, the model is applied to all matching documents. This introduces variance between what the model believed to be actual feature value distribution and has the potential to impact results negatively.

I am assuming this is a standard problem, but haven't come across good articles/papers which discuss this.

Few possible solutions in my head -

  1. Introduce more randomness in training data but that has limited scope without impacting actual users or corrupting data significantly

  2. During prediction, apply the model only on top K documents (which were being seen previously). This increases parity between training and prediction but limits effectiveness of model to only rerank a limited set.

Is there a better way to handle this difference in training and prediction ?

Topic learning-to-rank training ranking

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.