Understanding DQN Algorithm

Im studying the deep q learning algorithm. You can see it in the picture here: DQN

I have a few questions about the deep q learning algorithm. What do they mean with row 14: If D_i = 0, set Y_i = ... They want me to take an action a' which maximizes the function Q which means i have to insert every action a in that state.

If i have a1 and a2 I have to insert a1 and then a2 to test which gives me the maximum right? But the input of my networks are states. So how do I know which action maximizes my network?

Do I have to look in the last layer. Where I have Q(s,a1) and Q(s,a2) to look which one has a higher value and take that action?

Like in this architecture

Topic q-learning reinforcement-learning neural-network machine-learning

Category Data Science


You do not need to insert anything else but state in your NN. The OUTPUTS of your NN are the action-values per action. Given a state (input) the NN will output 4 action-values (outputs) assuming there are 4 available actions. Then you compare the resulting outputs and select the maximum one. Thats for the max operator. For the policy (line: 8), you need the argmax operator which means you need the action that results in the maximum value of Q. In other words having selected for example the max Q = Q(a1) the action that maximizes the Q is a1.

It might help you visualizing the NN output as a vector $\mathbf{q}=[q(a_1),q(a_2),q(a_3),q(a_4)]$. So you do have the action-value for every available action for a particular state. Now you can implement the max and argmax operators.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.