keras-rl

Keras on-policy "Advantage Actor Critic" implementation

Osiris

2022年3月10日 06:16

understand and implement on-policy "Advantage Actor-Critic" The Keras RL example is straight and simple, it uses The Kersa functional API to create an actor-critic and after each episode calculate loss and Gradient(episodic or off-policy). Because it calculates gradient at end of each episode it seems to be an off-policy implementation(which takes random actions to try to explore the environment). What I want to do, is implement an on-policy Advantage actor-critic that calculates and updates loss and gradient at each step …

Topic: actor-critic keras-rl keras tensorflow reinforcement-learning

Category: Data Science

Improving the Actor Critic algorithm proposed by Keras

Siderius

2021年11月10日 10:31

In this page of keras's website, a reinforcement learning algorithm based in an actor critic scheme has been described. It is a deep policy gradient algorithm (hence DPG). Of course keras functions are central in this code, for this reason tensorflow tries to have an access to a NVIDIA gpu for the acceleration, otherwise it does use the accessible cores. I believe that this code is not optimized because it uses only one core, the main part of the code …

Topic: policy-gradients keras-rl gpu tensorflow reinforcement-learning

Category: Data Science

using Reinforcement learning for binary classification

sdbvuf sbjdsfdib

2021年6月17日 09:39

I want to build an agent for binary classification. I have a large dataset with two label (0 and 1). I want to build an agent to predict labels. I build a deep model and now I want to build an agent. I use keras-rl2. but there is a problem: for dqn agent, the fit function has an env argument. I don't know how can I define my problem environment for that. note that my problem has a similarity function …

Topic: binary-classification keras-rl reinforcement-learning optimization

Category: Data Science

Evaluating a trained Reinforcement Learning Agent?

cvg

2021年4月9日 05:15

I am new to reinforcement learning agent training. I have read about PPO algorithm and used stable baselines library to train an agent using PPO. So my question here is how do I evaluate a trained RL agent. Consider for a regression or classification problem I have metrics like r2_score or accuracy etc.. Are there any such parameters or how do I test the agent, conclude that the agent is trained well or bad. Thanks

Topic: actor-critic dqn keras-rl monte-carlo reinforcement-learning

Category: Data Science

Is "nb_steps_warmup" set for each episode or globally?

StefanOverFlow

2021年1月21日 08:14

When I configure a DQN agent, nb_steps_warmup can be set. Is this parameter set for each episode or once globally? What I am trying to ask is, imaging I have a game environment which takes about 3000 max. steps per episode. The DQN is fitted as follows: dqn.fit(env, nb_steps=30000, visualize=True, verbose=2) So, as I understand it, the fitting will run approximately 10 episodes (nb_steps / max. steps per episode). If I set nb_steps_warmup = 5000, what actually happens? A) nb_steps_warmup=5000, …

Topic: keras-rl keras reinforcement-learning python

Category: Data Science

Keras models break when I add batch normalization

axon

2020年11月4日 02:00

I'm creating the model for a DDPG agent (keras-rl version) but i'm having some trouble with errors whenever I try adding in batch normalization in the first of two networks. Here is the creation function as i'd like it to be: def buildDDPGNets(actNum, obsSpace): actorObsInput = Input(shape = (1,) + obsSpace, name = "actor_obs_input") a = Flatten()(actorObsInput) a = Dense(600, use_bias = False)(a) a = BatchNormalization()(a) a = Activation("relu")(a) a = Dense(300, use_bias = False)(a) a = BatchNormalization()(a) a = …

Topic: keras-rl batch-normalization keras neural-network

Category: Data Science

What are the effects of clipping the reward in stability?

user10296606

2020年3月15日 03:24

I am looking for stabilizing my results of DQN, I found clipping is one technique to do it but I did not understand it completely! 1- what are the effects of clipping the reward, clipping the gradient, clipping the error in stability and how makes results more stable? 2- In DQN nature it has written they clipping the reward? Would you please explain this more? 3- which of them are more effective in stability?

Topic: dqn keras-rl training tensorflow deep-learning

Category: Data Science

Actions taken by agentn/ agent performance not improving

cvg

2020年1月21日 05:41

Hi I am trying to develop an rl agent using PPO algorithm. My agent takes an action(CFM) to maintain a state variable called RAT in between 24 to 24.5. I am using PPO algorithm of stable-baselines library to train my agent.I have trained the agent for 2M steps. Hyper-parameters in the code: def __init__(self, *args, **kwargs): super(CustomPolicy, self).__init__(*args, **kwargs, net_arch=[dict(pi=[64, 64], vf=[64, 64])], feature_extraction="mlp") model = PPO2(CustomPolicy,env,gamma=0.8, n_steps=132, ent_coef=0.01, learning_rate=1e-3, vf_coef=0.5, max_grad_norm=0.5, lam=0.95, nminibatches=4, noptepochs=4, cliprange=0.2, cliprange_vf=None, verbose=0, tensorboard_log="./20_01_2020_logs/", _init_setup_model=True, …

Topic: discounted-reward actor-critic keras-rl reinforcement-learning

Category: Data Science

What is a minimal setup to solve the CartPole-v0 with DQN?

Martin Thoma

2019年12月31日 18:01

I solved the CartPole-v0 with a CEM agent pretty easily (experiments and code), but I struggle to find a setup which works with DQN. Do you know which parameters should be adjusted so that the mean reward is about 200 for this problem? What I tried Adjustments in the model: Deeper / less deep, neurons per layer Memory size (how many steps are stored for replay) What I'm unsure about How should I choose the memory? Is higher always better? …

Topic: dqn openai-gym keras-rl reinforcement-learning

Category: Data Science

Formulation of a reward structure

cvg

2019年11月26日 13:07

I am new to reinforcement learning and experimenting with training of RL agents. I have a doubt about reward formulation, from a given state if a agent takes a good action i give a positive reward, and if the action is bad, i give a negative reward. So if i give the agent very high positive rewards when it takes a good action, like 100 times positive value as compared to negative rewards, will it help agent during the training? …

Topic: discounted-reward actor-critic keras-rl ai reinforcement-learning

Category: Data Science

Q-Learning experience replay: how to feed the neural network?

Joaquin

2019年4月11日 15:22

I'm trying to replicate the DQN Atari experiment. Actually my DQN isn't performing well; checking another one's codes, I saw something about experience replay which I don't understand. First, when you define your CNN, in the first layer you have to specify the size (I'm using Keras + Tensorflow so in my case it's something like (105, 80, 4), which corresponds to height, width and number of images I feed my CNN.). In the codes I revisited, when they get …

Topic: dqn keras-rl q-learning reinforcement-learning python

Category: Data Science

How to implement clipping the reward in DQN in keras

user10296606

2018年12月1日 14:00

How to implement clipping the reward in DQN in keras? especially how to implement clipping the reward? Is this pseudo code correct: if reward<-threshold reward=-1 elseif reward>threshold reward=1 elseif -threshold<reward<threshold reward=reward/threshold And if reward is always positive how we can change clipping the reward?

Topic: dqn keras-rl training tensorflow deep-learning

Category: Data Science

with tf.device(DEVICE): model = modellib.MaskRCNN(mode = "inference", model_dir = LOGS_DIR, config = config)

shiva

2018年10月26日 17:37

ValueError Traceback (most recent call last) /miniconda/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py in _apply_op_helper(self, op_type_name, name, **keywords) 509 as_ref=input_arg.is_ref, --> 510 preferred_dtype=default_dtype) 511 except TypeError as err: /miniconda/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in internal_convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, ctx) 1106 if ret is None: -> 1107 ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) 1108 /miniconda/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py in _autopacking_conversion_function(v, dtype, name, as_ref) 959 return NotImplemented --> 960 return _autopacking_helper(v, inferred_dtype, name or "packed") 961 /miniconda/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py in _autopacking_helper(list_or_tuple, dtype, name) 921 elems_as_tensors.append( --> 922 constant_op.constant(elem, dtype=dtype, name=str(i))) 923 return gen_array_ops.pack(elems_as_tensors, name=scope) /miniconda/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py in …

Topic: faster-rcnn keras-rl keras tensorflow

Category: Data Science

About