Understanding derivation of gradient optimisation problem
I'm following a tutorial on youtube about reinforcement learning.
They are going through the steps to understand policy gradient optimisation.
In one of the steps he says (delta policy)/policy == delta log policy
.
How can he make that jump?
I have attached a screenshot from the video and also a link to the video.
https://www.youtube.com/watch?v=wDVteayWWvUlist=PLMrJAkhIeNNR20Mz-VpzgfQs5zrYi085mindex=48ab_channel=SteveBrunton
Topic policy-gradients reinforcement-learning
Category Data Science