dqn - Geeks Mental

Dimensionality of the target for DQN agent training

Dhoop

2022年5月19日 15:00

From what I understand, a DQN agent has as many outputs as there are actions (for each state). If we consider a scalar state with 4 actions, that would mean that the DQN would have a 4 dimensional output. However, when it comes to the target value for training the agent, it is usually described as a scalar value = reward + discount*best_future_Q. How could a scalar value be used to train a Neural Network having a vector output? For …

Topic: dqn q-learning deep-learning machine-learning

Category: Data Science

Cartpole - Number of layers and neurons - model hyperparameters

vimala

2022年5月19日 07:00

Can anyone please suggest me how to arrive to the best optimal values for number of layers, number of neurons parameters of the deep learning model in DDQN algorithm for cartpole problem. As input and output neurons are 4 and 2 respectively for cartpole, are there any scientific reasons or maths behind choosing number of hidden layers and neurons in them. I have followed this link to build reinforcement learning algorithm https://pylessons.com/CartPole-reinforcement-learning/

Topic: dqn reinforcement-learning deep-learning

Category: Data Science

DQN fails to find optimal policy

macwiatrak

2022年5月14日 23:04

Based on DeepMind publication, I've recreated the environment and I am trying to make the DQN find and converge to an optimal policy. The task of an agent is to learn how to sustainably collect apples (objects), with the regrowth of the apples depending on its spatial configuration (the more apples around, the higher the regrowth). So in short: the agent has to find how to collect as many apples as he can (for collecting an apple he gets a …

Topic: deepmind dqn convergence q-learning reinforcement-learning

Category: Data Science

multidimensional output from a DQN

Bonsi

2022年5月12日 08:52

The output of a DQN gives the Q value of each actions and it is an one dimensional vector. Can we get the output from a DQN as a matrix?

Topic: dqn reinforcement-learning

Category: Data Science

How we can have RF-QLearning or SVR-QLearning (Combine these algorithm with a Q-Learning )

user10296606

2022年5月8日 02:01

How we can have RF-QLearning or SVR-QLearning (Combine these algorithm with a Q-Learning )? I want to replace the DNN section of Qlearning with a RF or SVR but the problem is that there is no clear training data that I can put in my code at tensorflow or keras! How we can do this?

Topic: svr dqn q-learning reinforcement-learning random-forest

Category: Data Science

Unexpected keyword argument error in tensorflow-agents replay buffers

Della

2022年5月4日 12:00

Following the tensorflow tutorial on deep reinforcement learning and DQN. Even after setting up the exact same libraries and running the same code, I am getting some error. from tf_agents.replay_buffers import reverb_utils .... rb_observer = reverb_utils.ReverbAddTrajectoryObserver( replay_buffer.py_client, table_name, sequence_length=2) # This line is throwing error This is the stacktrace TypeError Traceback (most recent call last) Input In [7], in <cell line: 23>() 15 reverb_server = reverb.Server([table]) 17 replay_buffer = reverb_replay_buffer.ReverbReplayBuffer( 18 agent.collect_data_spec, 19 table_name=table_name, 20 sequence_length=2, 21 local_server=reverb_server) ---> 23 …

Topic: deepmind dqn reinforcement-learning

Category: Data Science

How to Form the Training Examples for Deep Q Network in Reinforcement Learning?

Della

2022年4月19日 21:39

Trying to pick up basics of reinforcement learning by self-study from some blogs and texts. Forgive me if the question is too basic and different bits that I understand are a bit messy, but even after consulting a few references, I cannot really get how Deep Q learning with a neural network works. I understood the Bellman equation like this $$V^\pi(s)= R(s,\pi(s)) + \gamma \sum_{s'} P(s'|s,\pi(s)) V^\pi(s')$$ and the update rule of Q table. $$Q_{n+1}(s_t, a_t)=Q_n(s_t, a_t)+\alpha(r+\gamma\max_{a\in\mathcal{A}}Q(s_{t+1}, a)-Q_n(s_t, a_t))$$ But …

Topic: dqn openai-gym q-learning reinforcement-learning

Category: Data Science

When should the last action be included in the state in reinforcement learning?

user91315

2022年4月10日 06:06

I am having some confusion as to whether the action should be included as part of the state input to an agent in a reinforcement learning setting (state-action pair). As from my observation, this is not completely clear as different agents/environments combinations might have different performances if action was included/excluded from input states (I might be wrong). For my specific problem: the agent can't influence/control the states through its actions (similar to the case of a simple multi-armed bandit) the …

Topic: dqn reinforcement-learning deep-learning machine-learning

Category: Data Science

gym car racing v0 using DQN

Din

2022年3月31日 07:10

I am currently learning reinforcement learning and wanted to use it on the car racing-v0 environment. I have successfully made it using PPO algorithm and now I want to use a DQN algorithm but when I want to train the model it gives me this error: AssertionError: The algorithm only supports (<class 'gym.spaces.discrete.Discrete'>,) as action spaces but Box([-1. 0. 0.], [1. 1. 1.], (3,), float32) was provided Here is my code: import os import gym from stable_baselines3 import DQN from …

Topic: dqn openai-gym jupyter reinforcement-learning python

Category: Data Science

TF Agents DdqnAgent for Continuous Tasks (Non-Episodic Environments)

Maad A.Galil

2022年3月26日 00:17

I would like to use TF Agents in Non-Episodic environments (continuous tasks without a termination state). In such implementations, the agent can continue learning without the need to reset the environment at the end of an episode, where it usually calculates the return of the episode. I have found similar questions without answers here and there. This explanation seems to be convincing using the concept of average rewards. However, I would like to know whether TF Agents already provide such …

Topic: dqn tensorflow reinforcement-learning

Category: Data Science

How to calculate Temperature variable in softmax(boltzmann) exploration

cvg

2022年3月16日 22:04

Hi I am developing a reinforcement learning agent for a continous state/discrete action space. I am trying to use boltmzann/softmax exploration as action selection strategy. My action space is of size 5000. My implementation of boltzmann exploration: def get_action(state,episode,temperature = 1): state_encod = np.reshape(state, [1, state_size]) q_values = model.predict(state_encod) prob_act = np.empty(len(q_values[0])) for i in range(len(prob_act)): prob_act[i] = np.exp(q_values[0][i]/temperature) #numpy matrix element-wise division for denominator (sum of numerators) prob_act = np.true_divide(prob_act,sum(prob_act)) action_q_value = np.random.choice(q_values[0],p=prob_act) action_keys = np.where(q_values[0] == action_q_value) action_key …

Topic: dqn softmax ai reinforcement-learning deep-learning

Category: Data Science

How to i represent three lists in one tensor state for a DQN

Khaled Azem

2022年3月8日 13:07

I am building a DQN for atari game playing, and i have an algorithm that gives me data about objects in each frame, which are represented as three lists, first is the X-coordinate of an object second is the y-coordinate, and third is what class the object is in. an example would look like this: X=[22.3,54.0,1.12] Y=54.3,23.5,126.5] class=[1,1,2] i am intentionally using handcrafted methods rather than a CNN for my final year dissertation, and this implementation is using pytorch libraries …

Topic: dqn pytorch tensorflow deep-learning machine-learning

Category: Data Science

Is it possible to solve Rubik's cube using DQN?

Javier Jiménez de la Jara

2022年3月6日 05:07

I'm trying to solve Rubik's cube using deep learning and I came across with DQN, so I decided to give it a try. I developed all the code and started training but I got this results: Loss goes up and test never get better results. I have tried to change learning rate, epsilon greedy decay, reducing scramble moves to one but it still can't solve it with just one move. That's why I would like to know if it just …

Topic: dqn pytorch deep-learning python

Category: Data Science

Policy Gradient with continuous action space

cvg

2022年1月21日 13:29

How to apply reinforce/policy-gradient algorithms for continuous action space. I have learnt that one of the advantages of policy gradients is , it is applicable for continuous action space. One way I can think of is discretizing the action space same as the way we do it for dqn. Should we follow the same method for policy -gradient algorithms also ? Or is there any other way this is done? Thanks

Topic: policy-gradients dqn ai reinforcement-learning

Category: Data Science

Can Online DQN model overfit?

user125612

2021年10月7日 23:01

I am new in the area of RL and currently trying to train an online DQN model. Can an online model overfit since its always learning? and how can I tell if that happens?

Topic: dqn overfitting online-learning

Category: Data Science

DQN cannot learn or converge

2021年7月22日 16:05

I have implemented a DQN using keras. The task is to collect the circles and avoid the red circle and crosses. The associated rewards are +5, -5 and 0 otherwise. if the agent go out of the board, the game is reset (reward -5 too). The average reward fluctuates a long and I cannot observe any learning. I tried to use similar settings as for DQN Atari except that I don't concatenate the last 4 frames but train the neural …

Topic: dqn implementation reinforcement-learning

Category: Data Science

Am I using this neural network in a wrong way?

Daviiid

2021年7月9日 14:49

I'm trying to solve a RL problem; the Contextual Bandit problem using Deep Q Learning. My data is all simulated. I have this environment: class Environment(): def __init__(self): self._observation = np.zeros((3,)) def interact(self, action): self._observation = np.zeros((3,)) c1, c2, c3 = np.random.randint(0, 90, 3) self._observation[0]=c1 self._observation[1]=c2 self._observation[2]=c3 reward = -1.0 condition = False if (c1<30) and (c2<30) and (c3<30) and action==0: condition = True elif (30<=c1<60) and (30<=c2<60) and (30<=c3<60) and action==1: condition = True elif (60<=c1<90) and (60<=c2<90) and …

Topic: dqn q-learning keras reinforcement-learning deep-learning

Category: Data Science

Cannot train DQN to solve cartpole

Jayson Ng

2021年6月11日 19:16

This is the code of my DQN implementation. I have checked with codes in many other people's repositories. I cannot find any differences but it turns out my codes cannot train the model but theirs can. I guess there are some bugs in the learn() but I could not find any differences, they look the same as others' codes. class DQNAgent(): def __init__(self, net, capacity, n_actions, eps_start, eps_end, eps_decay, batch_size, gamma, lr): self.net = net self.target_net = copy.deepcopy(self.net) self.buffer = …

Topic: dqn pytorch reinforcement-learning machine-learning

Category: Data Science

Catastrophic Forgetting on DQN

DCnoob

2021年6月6日 21:21

I'm trying to explore solving the shortest path algorithm using DQN i know we can solve it using the Q-tablebut I just wanted to explore using deep learning. I have a set of nodes that I extracted from OpenStreetMap. Each node has an id. I contracted a data frame that contains the edges and the weight between the edges, which represents the distance you can find it here, and the graph network looks like this Now I wanted to train …

Topic: dqn reinforcement-learning

Category: Data Science

In DQN, why not use target network to predict current state Q values?

Lorenzo Tinfena

2021年5月7日 13:09

In DQN, why not use target network to predict current state Q values, and not only next state q values? In doing a basic dq learning algorithm with nn from scratch, with replay memory, and minibatch gd, and I'm implementing target neural network to predict at every minibatch samples current and mext state q values, and in the end of minibatch, sync target network, bu I notice the weights diverge very easily, maybe because I used nn to predict current …

Topic: dqn q-learning reinforcement-learning deep-learning machine-learning

Category: Data Science

About