Action selection in actor-critic algorithm:

I have an action space that is just a list of values given by acts = [i for i in range(10, 100, 10)]. According to pytorch documentary, the loss is calculated as below. Could someone explain to me how I can modify this procedure to sample actions from my action space?

   m = Categorical(probs)
   action = m.sample()
   next_state, reward = env.step(action)
   loss = -m.log_prob(action) * reward
   loss.backward()```  

Topic actor-critic pytorch reinforcement-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.