Keras on-policy "Advantage Actor Critic" implementation

understand and implement on-policy "Advantage Actor-Critic" The Keras RL example is straight and simple, it uses The Kersa functional API to create an actor-critic and after each episode calculate loss and Gradient(episodic or off-policy). Because it calculates gradient at end of each episode it seems to be an off-policy implementation(which takes random actions to try to explore the environment). What I want to do, is implement an on-policy Advantage actor-critic that calculates and updates loss and gradient at each step …
Category: Data Science

Improving the Actor Critic algorithm proposed by Keras

In this page of keras's website, a reinforcement learning algorithm based in an actor critic scheme has been described. It is a deep policy gradient algorithm (hence DPG). Of course keras functions are central in this code, for this reason tensorflow tries to have an access to a NVIDIA gpu for the acceleration, otherwise it does use the accessible cores. I believe that this code is not optimized because it uses only one core, the main part of the code …
Category: Data Science

using Reinforcement learning for binary classification

I want to build an agent for binary classification. I have a large dataset with two label (0 and 1). I want to build an agent to predict labels. I build a deep model and now I want to build an agent. I use keras-rl2. but there is a problem: for dqn agent, the fit function has an env argument. I don't know how can I define my problem environment for that. note that my problem has a similarity function …
Category: Data Science

Evaluating a trained Reinforcement Learning Agent?

I am new to reinforcement learning agent training. I have read about PPO algorithm and used stable baselines library to train an agent using PPO. So my question here is how do I evaluate a trained RL agent. Consider for a regression or classification problem I have metrics like r2_score or accuracy etc.. Are there any such parameters or how do I test the agent, conclude that the agent is trained well or bad. Thanks
Category: Data Science

Is "nb_steps_warmup" set for each episode or globally?

When I configure a DQN agent, nb_steps_warmup can be set. Is this parameter set for each episode or once globally? What I am trying to ask is, imaging I have a game environment which takes about 3000 max. steps per episode. The DQN is fitted as follows: dqn.fit(env, nb_steps=30000, visualize=True, verbose=2) So, as I understand it, the fitting will run approximately 10 episodes (nb_steps / max. steps per episode). If I set nb_steps_warmup = 5000, what actually happens? A) nb_steps_warmup=5000, …
Category: Data Science

Keras models break when I add batch normalization

I'm creating the model for a DDPG agent (keras-rl version) but i'm having some trouble with errors whenever I try adding in batch normalization in the first of two networks. Here is the creation function as i'd like it to be: def buildDDPGNets(actNum, obsSpace): actorObsInput = Input(shape = (1,) + obsSpace, name = "actor_obs_input") a = Flatten()(actorObsInput) a = Dense(600, use_bias = False)(a) a = BatchNormalization()(a) a = Activation("relu")(a) a = Dense(300, use_bias = False)(a) a = BatchNormalization()(a) a = …
Category: Data Science

What are the effects of clipping the reward in stability?

I am looking for stabilizing my results of DQN, I found clipping is one technique to do it but I did not understand it completely! 1- what are the effects of clipping the reward, clipping the gradient, clipping the error in stability and how makes results more stable? 2- In DQN nature it has written they clipping the reward? Would you please explain this more? 3- which of them are more effective in stability?
Category: Data Science

Actions taken by agentn/ agent performance not improving

Hi I am trying to develop an rl agent using PPO algorithm. My agent takes an action(CFM) to maintain a state variable called RAT in between 24 to 24.5. I am using PPO algorithm of stable-baselines library to train my agent.I have trained the agent for 2M steps. Hyper-parameters in the code: def __init__(self, *args, **kwargs): super(CustomPolicy, self).__init__(*args, **kwargs, net_arch=[dict(pi=[64, 64], vf=[64, 64])], feature_extraction="mlp") model = PPO2(CustomPolicy,env,gamma=0.8, n_steps=132, ent_coef=0.01, learning_rate=1e-3, vf_coef=0.5, max_grad_norm=0.5, lam=0.95, nminibatches=4, noptepochs=4, cliprange=0.2, cliprange_vf=None, verbose=0, tensorboard_log="./20_01_2020_logs/", _init_setup_model=True, …
Category: Data Science

What is a minimal setup to solve the CartPole-v0 with DQN?

I solved the CartPole-v0 with a CEM agent pretty easily (experiments and code), but I struggle to find a setup which works with DQN. Do you know which parameters should be adjusted so that the mean reward is about 200 for this problem? What I tried Adjustments in the model: Deeper / less deep, neurons per layer Memory size (how many steps are stored for replay) What I'm unsure about How should I choose the memory? Is higher always better? …
Category: Data Science

Formulation of a reward structure

I am new to reinforcement learning and experimenting with training of RL agents. I have a doubt about reward formulation, from a given state if a agent takes a good action i give a positive reward, and if the action is bad, i give a negative reward. So if i give the agent very high positive rewards when it takes a good action, like 100 times positive value as compared to negative rewards, will it help agent during the training? …
Category: Data Science

Q-Learning experience replay: how to feed the neural network?

I'm trying to replicate the DQN Atari experiment. Actually my DQN isn't performing well; checking another one's codes, I saw something about experience replay which I don't understand. First, when you define your CNN, in the first layer you have to specify the size (I'm using Keras + Tensorflow so in my case it's something like (105, 80, 4), which corresponds to height, width and number of images I feed my CNN.). In the codes I revisited, when they get …
Category: Data Science

How to implement clipping the reward in DQN in keras

How to implement clipping the reward in DQN in keras? especially how to implement clipping the reward? Is this pseudo code correct: if reward<-threshold reward=-1 elseif reward>threshold reward=1 elseif -threshold<reward<threshold reward=reward/threshold And if reward is always positive how we can change clipping the reward?
Category: Data Science

with tf.device(DEVICE): model = modellib.MaskRCNN(mode = "inference", model_dir = LOGS_DIR, config = config)

ValueError Traceback (most recent call last) /miniconda/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py in _apply_op_helper(self, op_type_name, name, **keywords) 509 as_ref=input_arg.is_ref, --> 510 preferred_dtype=default_dtype) 511 except TypeError as err: /miniconda/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in internal_convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, ctx) 1106 if ret is None: -> 1107 ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) 1108 /miniconda/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py in _autopacking_conversion_function(v, dtype, name, as_ref) 959 return NotImplemented --> 960 return _autopacking_helper(v, inferred_dtype, name or "packed") 961 /miniconda/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py in _autopacking_helper(list_or_tuple, dtype, name) 921 elems_as_tensors.append( --> 922 constant_op.constant(elem, dtype=dtype, name=str(i))) 923 return gen_array_ops.pack(elems_as_tensors, name=scope) /miniconda/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py in …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.