discounted-reward

Deep Reinforcement Learning - mean Q as an evaluation metric

Yassine

2020年9月22日 01:04

I'm tuning a deep learning model for a learner of Space Invaders game (image below). The state is defined as relative eucledian distance between the player and the enemies + relative distance between the player and 6 closest enemy lasers normalized by the window height (if the player's position is $(x_p,y_p)$ and an enemy's position is $(x_e,y_e)$, the relative euclidian distance is $\frac{\sqrt{(x_p-x_e)^2+(y_p-y_e)^2}}{HEIGHT}$ and HEIGHT is the window height). Hence the observation space dimension is (10+6), which results in an …

Topic: discounted-reward reinforcement-learning deep-learning neural-network machine-learning

Category: Data Science

How to choose between discounted reward and average reward?

user10296606

2020年4月18日 22:01

How to select between average reward and discounted reward? And when average reward is more effective in comparison with discounter reward and when vice versa is correct? Is is possible to use both of them in a problem? Because as I understand the RL reward is based on average reward or discounted future reward, but I think this paper use the discounted and average together. Is it correct: we use discounted future reward in order to training and average reward …

Topic: discounted-reward dqn reinforcement-learning

Category: Data Science

Actions taken by agentn/ agent performance not improving

cvg

2020年1月21日 05:41

Hi I am trying to develop an rl agent using PPO algorithm. My agent takes an action(CFM) to maintain a state variable called RAT in between 24 to 24.5. I am using PPO algorithm of stable-baselines library to train my agent.I have trained the agent for 2M steps. Hyper-parameters in the code: def __init__(self, *args, **kwargs): super(CustomPolicy, self).__init__(*args, **kwargs, net_arch=[dict(pi=[64, 64], vf=[64, 64])], feature_extraction="mlp") model = PPO2(CustomPolicy,env,gamma=0.8, n_steps=132, ent_coef=0.01, learning_rate=1e-3, vf_coef=0.5, max_grad_norm=0.5, lam=0.95, nminibatches=4, noptepochs=4, cliprange=0.2, cliprange_vf=None, verbose=0, tensorboard_log="./20_01_2020_logs/", _init_setup_model=True, …

Topic: discounted-reward actor-critic keras-rl reinforcement-learning

Category: Data Science

Formulation of a reward structure

cvg

2019年11月26日 13:07

I am new to reinforcement learning and experimenting with training of RL agents. I have a doubt about reward formulation, from a given state if a agent takes a good action i give a positive reward, and if the action is bad, i give a negative reward. So if i give the agent very high positive rewards when it takes a good action, like 100 times positive value as compared to negative rewards, will it help agent during the training? …

Topic: discounted-reward actor-critic keras-rl ai reinforcement-learning

Category: Data Science

How to formulate reward of an rl agent with two objectives

cvg

2019年9月17日 10:48

I have started learning reinforcement learning and trying to apply it for my use case. I am developing an rl agent which can maintain temperature at a particular value, and minimize the energy consumption if equipment by taking different actions that are available for it to take. I am trying to formulate a reward function for it. energy and temp_act can be measured energy_coeff = -10 temp_coeff = -10 temp_penalty = np.abs(temp_setpoint - temp_act) reward = energy_coeff * energy + …

Topic: discounted-reward dqn monte-carlo q-learning reinforcement-learning

Category: Data Science

Unbalanced discounted reward in reinforcement learning : is it a problem?

Astariul

2019年8月8日 05:52

Discounted rewards seems unbalanced to me. If we take as example an episode with 4 actions, where each action receive a reward of +1 : +1 -> +1 -> +1 -> +1 The discounted reward for the last action is : 1 The discounted reward for the first action (considering gamma = 1 for simplicity) is : 4 Intuitively both action are as good as the other, because both received same reward. But their total reward is different, unbalanced. So …

Topic: discounted-reward reinforcement-learning

Category: Data Science

Deep Reinforcement Learning - mean Q as an evaluation metric

How to choose between discounted reward and average reward?

Actions taken by agentn/ agent performance not improving

Formulation of a reward structure

How to formulate reward of an rl agent with two objectives

Unbalanced discounted reward in reinforcement learning : is it a problem?

About