temperature variable in boltzmmann-exploration in reinforcement learning
I have been using epsilon greedy action selection strategy and recently have come across boltzmann(softmax) action selection strategy. One thing I am not clear about boltzmann exploration is the temperature variable. How should we define this variable. Is this a constant variable or should be decreased over the period of training. and how to decide on the absolute value of this parameter?
Thanks
Topic deepmind softmax ai reinforcement-learning deep-learning
Category Data Science