Action selection in actor-critic algorithm:

EArwa

2020年3月30日 12:57

I have an action space that is just a list of values given by acts = [i for i in range(10, 100, 10)]. According to pytorch documentary, the loss is calculated as below. Could someone explain to me how I can modify this procedure to sample actions from my action space?

   m = Categorical(probs)
   action = m.sample()
   next_state, reward = env.step(action)
   loss = -m.log_prob(action) * reward
   loss.backward()```

Topic actor-critic pytorch reinforcement-learning

Category Data Science

Action selection in actor-critic algorithm:

About