Policy gradient - and auto-differentiation (Pytorch/Tensorflow)
In policy gradient, we have something like this:
Is my understanding correct that if I apply log cross-entropy on the last layer, the gradient will be automatically calculated as per formula above?
Topic policy-gradients pytorch tensorflow reinforcement-learning
Category Data Science