How to manually calculate the gradient that will propagate back over the network using the REINFORCE algorithm?

Question

How to manually calculate the gradient that will propagate back over the network using the REINFORCE algorithm?

Angelo Antonio Manzatto

2022年2月28日 10:57

I am trying to implement deep reinforcement policy gradient REINFORCE in C++ and for my case there is no autograd method like in pytorch so I have to manually calculate the gradient.

Let´s imaging that I have a scenario where the state space size is 4 and action space size is 2 (Cartpole). Also I collected the followind data for 3 steps:

action probability (softmax): [0.21, 0.34, 0.45], [0.91, 0.01, 0.08], [0.50, 0.30, 0.20] sampled action (one hot encoder) : [0,1,0],[1,0,0], [0,0,1] rewards: [1,1,-1] gamma: 0.99

The gradient for backpropagation must be of size:[g11,g12,g13],[g21,g22,g23],[g11,g12,g13], so the question is, how do I manually calculate these values using the above information ?

Topic gradient learning backpropagation

Category Data Science

How to manually calculate the gradient that will propagate back over the network using the REINFORCE algorithm?

About