Confidence in the rewards for a RL task
For a RL task that I am trying to solve, for which I train once per day, I have the rewards stored for each of those days, so that I can see the progress on daily basis.
In the beginning of the learning process, the reward for a given state fluctuates quite much. After about 10 days or so, the rewards start to normalize and the fluctuations are very small. In these cases, when I notice very small change in rewards from day-to-day, I can say that I have achieved a kind of confidence for this state and I want to measure this with a number.
Lets say for S1 I have the following data:
Day1 Day2 Day3 Day4 Day5 Day6 Day7 Day8
0.1 0.3 0.4 0.45 0.5 0.51 0.5 0.51
- in this case, after day 6 I start having confidence in my results and on Day8 the confidence should be maximal so far. But for example in Day3, the confidence should be low since the fluctuation is quite high.
Do you have any suggestion on how I could measure this?
Thank you!
Topic confidence reinforcement-learning
Category Data Science