How to deal with multiple possible rewards in a transition in MDP
Suppose there is a state S
with two transitions under action A
but both transited states are S'
. But the tricky part is that the two rewards are different. In this case, how should I construct the probability and reward matrix?
Topic markov-process
Category Data Science