Understanding derivation of gradient optimisation problem

I'm following a tutorial on youtube about reinforcement learning. They are going through the steps to understand policy gradient optimisation. In one of the steps he says (delta policy)/policy == delta log policy. How can he make that jump? I have attached a screenshot from the video and also a link to the video. https://www.youtube.com/watch?v=wDVteayWWvUlist=PLMrJAkhIeNNR20Mz-VpzgfQs5zrYi085mindex=48ab_channel=SteveBrunton

Topic policy-gradients reinforcement-learning

Category Data Science


That is called the "log trick". Essentially, from calculus: $$\frac{d}{dx}\log(f(x))=\frac{1}{f(x)}\frac{d}{dx}f(x)$$ Applying the same principle in higher dimensions and you get the equation you wrote.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.