How to manually calculate the gradient that will propagate back over the network using the REINFORCE algorithm?

I am trying to implement deep reinforcement policy gradient REINFORCE in C++ and for my case there is no autograd method like in pytorch so I have to manually calculate the gradient.

Let´s imaging that I have a scenario where the state space size is 4 and action space size is 2 (Cartpole). Also I collected the followind data for 3 steps:

action probability (softmax): [0.21, 0.34, 0.45], [0.91, 0.01, 0.08], [0.50, 0.30, 0.20] sampled action (one hot encoder) : [0,1,0],[1,0,0], [0,0,1] rewards: [1,1,-1] gamma: 0.99

The gradient for backpropagation must be of size:[g11,g12,g13],[g21,g22,g23],[g11,g12,g13], so the question is, how do I manually calculate these values using the above information ?

Topic gradient learning backpropagation

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.