A/B test results contradictory with offline machine learning model performance

This seems to be a common problem when bringing machine learning models to production.

Let's say we have an optimized machine learning model which gives decent performance metric in the unseen testing dataset. We are quite satisfied with that, and decided to bring the model online. Then we use A/B test to compare our website performance (i.e., revenue, customer engagement etc) with and without the new model. Somehow, our new model is not a clear winner or even a clear loser in the A/B test. How do we deal with such situation?

Here the model I mentioned is a machine learning model, for example ranking algorithm or a recommendation algorithm, but can be any algorithm in reality. Thanks for any help!

Topic ab-test machine-learning

Category Data Science


One way to deal with the situation is to investigate the differences between the training and A/B testing. Here a couple of common differences:

  • The modeling training process optimizes a machine learning loss function. A/B test optimizes a business value. The loss function and business value could diverge.

  • Data distributions are different. The machine learning model is trained on older data. The A/B test is on newer data. The older and newer data come from different distributions.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.