Policy gradient - and auto-differentiation (Pytorch/Tensorflow)

Question

Jed

2022年3月29日 07:04

In policy gradient, we have something like this:

Is my understanding correct that if I apply log cross-entropy on the last layer, the gradient will be automatically calculated as per formula above?

shaabhishek · Accepted Answer · 2018年12月15日 10:23

Yes, just take the cross-entropy loss of the last layer and take the gradient with respect to it. The actual action will be the target.

For example, in PyTorch: apply CrossEntropyLoss on the last layer (no need of doing softmax, as it is done implicitly by this function)