How to calculate relation function of State and Action in delayed action effect Environment?
I'm trying to calculate corr-coef(or other good relation function) of State and Action in 'delayed action effect' Envrironment. In this environment, agent observes states, then it returns action and reward. But actions effect to 'T' time after states. So, it is really hard to imagine how to access this environment.(Because this is the first trial.) Are there any good approaches in this situation?