Understanding derivation of gradient optimisation problem

Question

Understanding derivation of gradient optimisation problem

Funzo

2022年2月20日 17:24

I'm following a tutorial on youtube about reinforcement learning. They are going through the steps to understand policy gradient optimisation. In one of the steps he says (delta policy)/policy == delta log policy. How can he make that jump? I have attached a screenshot from the video and also a link to the video. https://www.youtube.com/watch?v=wDVteayWWvUlist=PLMrJAkhIeNNR20Mz-VpzgfQs5zrYi085mindex=48ab_channel=SteveBrunton

Topic policy-gradients reinforcement-learning

Category Data Science

Constantinos · Accepted Answer · 2022年2月20日 17:24

1

Constantinos answered at 2022年2月20日 17:24

That is called the "log trick". Essentially, from calculus: $$\frac{d}{dx}\log(f(x))=\frac{1}{f(x)}\frac{d}{dx}f(x)$$ Applying the same principle in higher dimensions and you get the equation you wrote.

Understanding derivation of gradient optimisation problem

About