Deep Reinforcement Learning - mean Q as an evaluation metric

I'm tuning a deep learning model for a learner of Space Invaders game (image below). The state is defined as relative eucledian distance between the player and the enemies + relative distance between the player and 6 closest enemy lasers normalized by the window height (if the player's position is $(x_p,y_p)$ and an enemy's position is $(x_e,y_e)$, the relative euclidian distance is $\frac{\sqrt{(x_p-x_e)^2+(y_p-y_e)^2}}{HEIGHT}$ and HEIGHT is the window height). Hence the observation space dimension is (10+6), which results in an …
Category: Data Science

How to choose between discounted reward and average reward?

How to select between average reward and discounted reward? And when average reward is more effective in comparison with discounter reward and when vice versa is correct? Is is possible to use both of them in a problem? Because as I understand the RL reward is based on average reward or discounted future reward, but I think this paper use the discounted and average together. Is it correct: we use discounted future reward in order to training and average reward …
Category: Data Science

Actions taken by agentn/ agent performance not improving

Hi I am trying to develop an rl agent using PPO algorithm. My agent takes an action(CFM) to maintain a state variable called RAT in between 24 to 24.5. I am using PPO algorithm of stable-baselines library to train my agent.I have trained the agent for 2M steps. Hyper-parameters in the code: def __init__(self, *args, **kwargs): super(CustomPolicy, self).__init__(*args, **kwargs, net_arch=[dict(pi=[64, 64], vf=[64, 64])], feature_extraction="mlp") model = PPO2(CustomPolicy,env,gamma=0.8, n_steps=132, ent_coef=0.01, learning_rate=1e-3, vf_coef=0.5, max_grad_norm=0.5, lam=0.95, nminibatches=4, noptepochs=4, cliprange=0.2, cliprange_vf=None, verbose=0, tensorboard_log="./20_01_2020_logs/", _init_setup_model=True, …
Category: Data Science

Formulation of a reward structure

I am new to reinforcement learning and experimenting with training of RL agents. I have a doubt about reward formulation, from a given state if a agent takes a good action i give a positive reward, and if the action is bad, i give a negative reward. So if i give the agent very high positive rewards when it takes a good action, like 100 times positive value as compared to negative rewards, will it help agent during the training? …
Category: Data Science

How to formulate reward of an rl agent with two objectives

I have started learning reinforcement learning and trying to apply it for my use case. I am developing an rl agent which can maintain temperature at a particular value, and minimize the energy consumption if equipment by taking different actions that are available for it to take. I am trying to formulate a reward function for it. energy and temp_act can be measured energy_coeff = -10 temp_coeff = -10 temp_penalty = np.abs(temp_setpoint - temp_act) reward = energy_coeff * energy + …
Category: Data Science

Unbalanced discounted reward in reinforcement learning : is it a problem?

Discounted rewards seems unbalanced to me. If we take as example an episode with 4 actions, where each action receive a reward of +1 : +1 -> +1 -> +1 -> +1 The discounted reward for the last action is : 1 The discounted reward for the first action (considering gamma = 1 for simplicity) is : 4 Intuitively both action are as good as the other, because both received same reward. But their total reward is different, unbalanced. So …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.