Can I use multi armed bandits to optimize how much both algorithms are weighted when creating a composite score?
So, I'm aware that multi-armed bandits are great for evaluating multiple models and from what I understand, it is mainly used to pick a specific model.
I would still like to evaluate two models but I want to do it differently. Take a look at this simple equation:
W_A * RecoScore_A + W_B * RecoScore_B = CompScore
Rather than optimize for a specific model for a given user, I'd like to optimize for a given set of weights.
I'm wondering if this makes sense and if you have seen any literature related to this. I'm having trouble finding anything online.
Topic ab-test experiments recommender-system
Category Data Science