How does Q-Learning deal with mixed strategies?

I'm trying to understand how Q-learning deals with games where the optimal policy is a mixed strategy. The Bellman equation says that you should choose $max_a(Q(s,a))$ but this implies a single unique action for each $s$. Is Q-learning just not appropriate if you believe that the problem has a mixed strategy?

Topic q-learning reinforcement-learning machine-learning

Category Data Science


One possibility is to use softmax and choose each action a randomly with probabiliy $p = \frac{\exp(Q(s,a))}{\sum_a \exp(Q(s,a))}$. I don't thinks it is still Q-learning though.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.