Policy gradient - and auto-differentiation (Pytorch/Tensorflow)

In policy gradient, we have something like this:

Is my understanding correct that if I apply log cross-entropy on the last layer, the gradient will be automatically calculated as per formula above?

Topic policy-gradients pytorch tensorflow reinforcement-learning

Category Data Science


Yes, just take the cross-entropy loss of the last layer and take the gradient with respect to it. The actual action will be the target.

For example, in PyTorch: apply CrossEntropyLoss on the last layer (no need of doing softmax, as it is done implicitly by this function)

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.