How does Q-Learning deal with mixed strategies?

Question

How does Q-Learning deal with mixed strategies?

Thomas Johnson

2022年5月31日 06:05

I'm trying to understand how Q-learning deals with games where the optimal policy is a mixed strategy. The Bellman equation says that you should choose $max_a(Q(s,a))$ but this implies a single unique action for each $s$. Is Q-learning just not appropriate if you believe that the problem has a mixed strategy?

Topic q-learning reinforcement-learning machine-learning

Category Data Science

Robin Nicole · Accepted Answer · 2018年12月20日 22:29

1

Robin Nicole answered at 2018年12月20日 22:29

One possibility is to use softmax and choose each action a randomly with probabiliy $p = \frac{\exp(Q(s,a))}{\sum_a \exp(Q(s,a))}$. I don't thinks it is still Q-learning though.

How does Q-Learning deal with mixed strategies?

About