Action selection in actor-critic algorithm:
I have an action space that is just a list of values given by acts = [i for i in range(10, 100, 10)]. According to pytorch documentary, the loss is calculated as below. Could someone explain to me how I can modify this procedure to sample actions from my action space?
m = Categorical(probs)
action = m.sample()
next_state, reward = env.step(action)
loss = -m.log_prob(action) * reward
loss.backward()```
Topic actor-critic pytorch reinforcement-learning
Category Data Science