How to calculate Temperature variable in softmax(boltzmann) exploration
Hi I am developing a reinforcement learning agent for a continous state/discrete action space. I am trying to use boltmzann/softmax exploration as action selection strategy. My action space is of size 5000.
My implementation of boltzmann exploration:
def get_action(state,episode,temperature = 1):
state_encod = np.reshape(state, [1, state_size])
q_values = model.predict(state_encod)
prob_act = np.empty(len(q_values[0]))
for i in range(len(prob_act)):
prob_act[i] = np.exp(q_values[0][i]/temperature)
#numpy matrix element-wise division for denominator (sum of numerators)
prob_act = np.true_divide(prob_act,sum(prob_act))
action_q_value = np.random.choice(q_values[0],p=prob_act)
action_keys = np.where(q_values[0] == action_q_value)
action_key = action_keys[0][0]
action = index_to_action_mapping[action_key]
return action
If my temperature variable is 200, after 100 episodes I get an error
ValueError: probabilities contain NaN
If my temperature is 1 in very few episodes i get NaN error.
Why is this happening. Am I doing something wrong here? How to select the temperature variable? Can someone help me with this.
Thanks.
Topic dqn softmax ai reinforcement-learning deep-learning
Category Data Science