contextual bandits for online learning

Question

contextual bandits for online learning

Pavan Sangha

2021年5月11日 00:01

Which of the algorithms in the current literature for contextual bandits can be implemented for online learning and which ones can't? I'd really appreciate it if someone could provide a link to papers too! Thanks for the help!

Topic randomized-algorithms online-learning reinforcement-learning machine-learning

Category Data Science

matanster · Accepted Answer · 2018年7月23日 09:48

My answer can only be considered partial, I've not compiled a list, but I believe all algorithms implemented here, are, well, implemented for both offline and online mode. This one, can also be implemented for online mode.

Not trying to imply you should use that implementation, but this is kind of a living proof transcending deducing very analytically from articles. The thing to understand, is that certain CB algorithms are paired with rather benign algorithms for using offline-accumulated data for training them, in turn paired with mathematical proofs that the loss they incur in that offline training is a good predictor for the loss they'll incur in online mode (if the real world is still 'sufficiently similar' to the one logged from).

Some (other than mentioned above) algorithms may be only applicable to offline training, or at least I'm not aware of a theoretical refutation, that an algorithm may offline-train better in a way precluding direct use for online learning with the same algorithm. But many algorithms are encoded in software only for offline evaluations, as a lot of research dwells on offline. So I think it's a good question!

I think you should certainly email an author of any article that seems really helpful to you, to specifically ask them, if the article doesn't make that 100% clear, and they might even point you in rare cases at a solid online implementation! Do note online usage entails more production-readiness software considerations, and might be an extra mile in terms of the quality of the software expected as such ...

contextual bandits for online learning

About