Dimensionality of the target for DQN agent training

From what I understand, a DQN agent has as many outputs as there are actions (for each state). If we consider a scalar state with 4 actions, that would mean that the DQN would have a 4 dimensional output. However, when it comes to the target value for training the agent, it is usually described as a scalar value = reward + discount*best_future_Q. How could a scalar value be used to train a Neural Network having a vector output? For …
Category: Data Science

Cartpole - Number of layers and neurons - model hyperparameters

Can anyone please suggest me how to arrive to the best optimal values for number of layers, number of neurons parameters of the deep learning model in DDQN algorithm for cartpole problem. As input and output neurons are 4 and 2 respectively for cartpole, are there any scientific reasons or maths behind choosing number of hidden layers and neurons in them. I have followed this link to build reinforcement learning algorithm https://pylessons.com/CartPole-reinforcement-learning/
Category: Data Science

DQN fails to find optimal policy

Based on DeepMind publication, I've recreated the environment and I am trying to make the DQN find and converge to an optimal policy. The task of an agent is to learn how to sustainably collect apples (objects), with the regrowth of the apples depending on its spatial configuration (the more apples around, the higher the regrowth). So in short: the agent has to find how to collect as many apples as he can (for collecting an apple he gets a …
Category: Data Science

Unexpected keyword argument error in tensorflow-agents replay buffers

Following the tensorflow tutorial on deep reinforcement learning and DQN. Even after setting up the exact same libraries and running the same code, I am getting some error. from tf_agents.replay_buffers import reverb_utils .... rb_observer = reverb_utils.ReverbAddTrajectoryObserver( replay_buffer.py_client, table_name, sequence_length=2) # This line is throwing error This is the stacktrace TypeError Traceback (most recent call last) Input In [7], in <cell line: 23>() 15 reverb_server = reverb.Server([table]) 17 replay_buffer = reverb_replay_buffer.ReverbReplayBuffer( 18 agent.collect_data_spec, 19 table_name=table_name, 20 sequence_length=2, 21 local_server=reverb_server) ---> 23 …
Category: Data Science

How to Form the Training Examples for Deep Q Network in Reinforcement Learning?

Trying to pick up basics of reinforcement learning by self-study from some blogs and texts. Forgive me if the question is too basic and different bits that I understand are a bit messy, but even after consulting a few references, I cannot really get how Deep Q learning with a neural network works. I understood the Bellman equation like this $$V^\pi(s)= R(s,\pi(s)) + \gamma \sum_{s'} P(s'|s,\pi(s)) V^\pi(s')$$ and the update rule of Q table. $$Q_{n+1}(s_t, a_t)=Q_n(s_t, a_t)+\alpha(r+\gamma\max_{a\in\mathcal{A}}Q(s_{t+1}, a)-Q_n(s_t, a_t))$$ But …
Category: Data Science

When should the last action be included in the state in reinforcement learning?

I am having some confusion as to whether the action should be included as part of the state input to an agent in a reinforcement learning setting (state-action pair). As from my observation, this is not completely clear as different agents/environments combinations might have different performances if action was included/excluded from input states (I might be wrong). For my specific problem: the agent can't influence/control the states through its actions (similar to the case of a simple multi-armed bandit) the …
Category: Data Science

gym car racing v0 using DQN

I am currently learning reinforcement learning and wanted to use it on the car racing-v0 environment. I have successfully made it using PPO algorithm and now I want to use a DQN algorithm but when I want to train the model it gives me this error: AssertionError: The algorithm only supports (<class 'gym.spaces.discrete.Discrete'>,) as action spaces but Box([-1. 0. 0.], [1. 1. 1.], (3,), float32) was provided Here is my code: import os import gym from stable_baselines3 import DQN from …
Category: Data Science

TF Agents DdqnAgent for Continuous Tasks (Non-Episodic Environments)

I would like to use TF Agents in Non-Episodic environments (continuous tasks without a termination state). In such implementations, the agent can continue learning without the need to reset the environment at the end of an episode, where it usually calculates the return of the episode. I have found similar questions without answers here and there. This explanation seems to be convincing using the concept of average rewards. However, I would like to know whether TF Agents already provide such …
Category: Data Science

How to calculate Temperature variable in softmax(boltzmann) exploration

Hi I am developing a reinforcement learning agent for a continous state/discrete action space. I am trying to use boltmzann/softmax exploration as action selection strategy. My action space is of size 5000. My implementation of boltzmann exploration: def get_action(state,episode,temperature = 1): state_encod = np.reshape(state, [1, state_size]) q_values = model.predict(state_encod) prob_act = np.empty(len(q_values[0])) for i in range(len(prob_act)): prob_act[i] = np.exp(q_values[0][i]/temperature) #numpy matrix element-wise division for denominator (sum of numerators) prob_act = np.true_divide(prob_act,sum(prob_act)) action_q_value = np.random.choice(q_values[0],p=prob_act) action_keys = np.where(q_values[0] == action_q_value) action_key …
Category: Data Science

How to i represent three lists in one tensor state for a DQN

I am building a DQN for atari game playing, and i have an algorithm that gives me data about objects in each frame, which are represented as three lists, first is the X-coordinate of an object second is the y-coordinate, and third is what class the object is in. an example would look like this: X=[22.3,54.0,1.12] Y=54.3,23.5,126.5] class=[1,1,2] i am intentionally using handcrafted methods rather than a CNN for my final year dissertation, and this implementation is using pytorch libraries …
Category: Data Science

Is it possible to solve Rubik's cube using DQN?

I'm trying to solve Rubik's cube using deep learning and I came across with DQN, so I decided to give it a try. I developed all the code and started training but I got this results: Loss goes up and test never get better results. I have tried to change learning rate, epsilon greedy decay, reducing scramble moves to one but it still can't solve it with just one move. That's why I would like to know if it just …
Category: Data Science

Policy Gradient with continuous action space

How to apply reinforce/policy-gradient algorithms for continuous action space. I have learnt that one of the advantages of policy gradients is , it is applicable for continuous action space. One way I can think of is discretizing the action space same as the way we do it for dqn. Should we follow the same method for policy -gradient algorithms also ? Or is there any other way this is done? Thanks
Category: Data Science

DQN cannot learn or converge

I have implemented a DQN using keras. The task is to collect the circles and avoid the red circle and crosses. The associated rewards are +5, -5 and 0 otherwise. if the agent go out of the board, the game is reset (reward -5 too). The average reward fluctuates a long and I cannot observe any learning. I tried to use similar settings as for DQN Atari except that I don't concatenate the last 4 frames but train the neural …
Category: Data Science

Am I using this neural network in a wrong way?

I'm trying to solve a RL problem; the Contextual Bandit problem using Deep Q Learning. My data is all simulated. I have this environment: class Environment(): def __init__(self): self._observation = np.zeros((3,)) def interact(self, action): self._observation = np.zeros((3,)) c1, c2, c3 = np.random.randint(0, 90, 3) self._observation[0]=c1 self._observation[1]=c2 self._observation[2]=c3 reward = -1.0 condition = False if (c1<30) and (c2<30) and (c3<30) and action==0: condition = True elif (30<=c1<60) and (30<=c2<60) and (30<=c3<60) and action==1: condition = True elif (60<=c1<90) and (60<=c2<90) and …
Category: Data Science

Cannot train DQN to solve cartpole

This is the code of my DQN implementation. I have checked with codes in many other people's repositories. I cannot find any differences but it turns out my codes cannot train the model but theirs can. I guess there are some bugs in the learn() but I could not find any differences, they look the same as others' codes. class DQNAgent(): def __init__(self, net, capacity, n_actions, eps_start, eps_end, eps_decay, batch_size, gamma, lr): self.net = net self.target_net = copy.deepcopy(self.net) self.buffer = …
Category: Data Science

Catastrophic Forgetting on DQN

I'm trying to explore solving the shortest path algorithm using DQN i know we can solve it using the Q-tablebut I just wanted to explore using deep learning. I have a set of nodes that I extracted from OpenStreetMap. Each node has an id. I contracted a data frame that contains the edges and the weight between the edges, which represents the distance you can find it here, and the graph network looks like this Now I wanted to train …
Category: Data Science

In DQN, why not use target network to predict current state Q values?

In DQN, why not use target network to predict current state Q values, and not only next state q values? In doing a basic dq learning algorithm with nn from scratch, with replay memory, and minibatch gd, and I'm implementing target neural network to predict at every minibatch samples current and mext state q values, and in the end of minibatch, sync target network, bu I notice the weights diverge very easily, maybe because I used nn to predict current …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.