A/B test results contradictory with offline machine learning model performance
This seems to be a common problem when bringing machine learning models to production.
Let's say we have an optimized machine learning model which gives decent performance metric in the unseen testing dataset. We are quite satisfied with that, and decided to bring the model online. Then we use A/B test to compare our website performance (i.e., revenue, customer engagement etc) with and without the new model. Somehow, our new model is not a clear winner or even a clear loser in the A/B test. How do we deal with such situation?
Here the model I mentioned is a machine learning model, for example ranking algorithm or a recommendation algorithm, but can be any algorithm in reality. Thanks for any help!
Topic ab-test machine-learning
Category Data Science