q-learning

How does Q-Learning deal with mixed strategies?

Thomas Johnson

2022年5月31日 06:05

I'm trying to understand how Q-learning deals with games where the optimal policy is a mixed strategy. The Bellman equation says that you should choose $max_a(Q(s,a))$ but this implies a single unique action for each $s$. Is Q-learning just not appropriate if you believe that the problem has a mixed strategy?

Topic: q-learning reinforcement-learning machine-learning

Category: Data Science

Dimensionality of the target for DQN agent training

Dhoop

2022年5月19日 15:00

From what I understand, a DQN agent has as many outputs as there are actions (for each state). If we consider a scalar state with 4 actions, that would mean that the DQN would have a 4 dimensional output. However, when it comes to the target value for training the agent, it is usually described as a scalar value = reward + discount*best_future_Q. How could a scalar value be used to train a Neural Network having a vector output? For …

Topic: dqn q-learning deep-learning machine-learning

Category: Data Science

Reinforcement learning: negative reward (punish) illegal actions?

BigBadMe

2022年5月17日 21:05

If you train an agent using reinforcement learning (with Q-function in this case), should you give a negative reward (punish) if the agent proposes illegal actions for the presented state? I guess over time if you only select from between the legal actions, the illegal ones would eventually drop out, but would punishing them cause them to drop out sooner and possibly cause the agent to explore more possible legal actions sooner? To expand on this further; say you're training …

Topic: q-learning reinforcement-learning machine-learning

Category: Data Science

DQN fails to find optimal policy

macwiatrak

2022年5月14日 23:04

Based on DeepMind publication, I've recreated the environment and I am trying to make the DQN find and converge to an optimal policy. The task of an agent is to learn how to sustainably collect apples (objects), with the regrowth of the apples depending on its spatial configuration (the more apples around, the higher the regrowth). So in short: the agent has to find how to collect as many apples as he can (for collecting an apple he gets a …

Topic: deepmind dqn convergence q-learning reinforcement-learning

Category: Data Science

Why DQN but no Deep Sarsa?

Robin

2022年5月10日 12:47

Why is DQN frequently used while there is hardly any occurrence of Deep Sarsa? I found this paper https://arxiv.org/pdf/1702.03118.pdf using it, but nothing else which might be relevant. I assume the cause could be the Ape-X architecture which came up the year after the Deep Sarsa paper and allowed to generate an immense amount of experience for off-policy algorithms. Does it make sense or is their any other reason?

Topic: q-learning reinforcement-learning

Category: Data Science

How we can have RF-QLearning or SVR-QLearning (Combine these algorithm with a Q-Learning )

user10296606

2022年5月8日 02:01

How we can have RF-QLearning or SVR-QLearning (Combine these algorithm with a Q-Learning )? I want to replace the DNN section of Qlearning with a RF or SVR but the problem is that there is no clear training data that I can put in my code at tensorflow or keras! How we can do this?

Topic: svr dqn q-learning reinforcement-learning random-forest

Category: Data Science

Deep Q-learning, how to set q-value of non-selected actions?

Rasoul

2022年5月7日 12:07

I am learning Deep Q-learning by applying it to a real world problem. I have been through some tutorials and papers available online but I counldn't figure out the solution for the following problem statement. Let's say we have $N$ possible actions in each state to select from. When in state $s$ we make a move by selecting an action $a_i, i=1\dots N$, as the result we get a reward $r$ and end up in a new state $s^\prime$. In …

Topic: q-learning deep-learning neural-network

Category: Data Science

How to Form the Training Examples for Deep Q Network in Reinforcement Learning?

Della

2022年4月19日 21:39

Trying to pick up basics of reinforcement learning by self-study from some blogs and texts. Forgive me if the question is too basic and different bits that I understand are a bit messy, but even after consulting a few references, I cannot really get how Deep Q learning with a neural network works. I understood the Bellman equation like this $$V^\pi(s)= R(s,\pi(s)) + \gamma \sum_{s'} P(s'|s,\pi(s)) V^\pi(s')$$ and the update rule of Q table. $$Q_{n+1}(s_t, a_t)=Q_n(s_t, a_t)+\alpha(r+\gamma\max_{a\in\mathcal{A}}Q(s_{t+1}, a)-Q_n(s_t, a_t))$$ But …

Topic: dqn openai-gym q-learning reinforcement-learning

Category: Data Science

IndexError: index 804 is out of bounds for axis 0 with size 800

2022年4月14日 11:06

i installed a self driving car project from superdatascience site , when i open the map using terminal after a while the map window close up or it closes directly after i maximize the map window and it gives me this error : [INFO ] [Base ] Leaving application in progress... Traceback (most recent call last): File "map.py", line 235, in <module> CarApp().run() File "/usr/lib/python2.7/dist-packages/kivy/app.py", line 826, in run runTouchApp() File "/usr/lib/python2.7/dist-packages/kivy/base.py", line 502, in runTouchApp EventLoop.window.mainloop() File "/usr/lib/python2.7/dist-packages/kivy/core/window/window_sdl2.py", line …

Topic: q-learning reinforcement-learning python

Category: Data Science

Alternative approach for Q-Learning

maurock

2022年4月3日 04:07

I have a question related to an alterative Q-Learning approach. I'd like to know if this already exists and I am not aware of it, or it doesn't exist because there are theoretical problems behind it. Traditional Q-Learning In traditional Q-learning, the update of the Q-value happens at every iteration. The agent is in state s, performs action a, reaches state s' and obtains reward r. The Q-value for that pair state-action is updated according to the Bellman equation. As …

Topic: q-learning reinforcement-learning machine-learning

Category: Data Science

Why not use max(returns) instead of average(returns) in off-policy Monte Carlo control?

Meldgaard

2022年3月15日 15:04

As I understand it, in reinforcement learning, off-policy Monte Carlo control is when the state-action value function $Q(s,a)$ is estimated as a weighted average of the observed returns. However, in Q-learning the value of $Q(s, a)$ is estimated as the maximum expected return. Why is this not used in Monte Carlo control? Suppose I have a simple 2-dimensional bridge game, where the objective is to get from a to b. I can move left, right, up or down. Lets say …

Topic: monte-carlo q-learning reinforcement-learning

Category: Data Science

Q-learning episode and relation to convergence in MY scenario?

knowledge_seeker

2021年12月25日 07:16

I used Q-learning for routing. I have used the Bellman equation. I have certain other technical aspects in the code that add some novelty. But I have mixed doubts regarding episode and corresponding convergence in my case. I am unable to understand what would be an episode. E.g. a service comes, I assign a route to it and do some other stuff. I want the service acceptance to be more in the 'long' run (as more services come, some depart …

Topic: reward q-learning reinforcement-learning

Category: Data Science

Different Initial Q-Values in Q-Learning

Giulia

2021年11月21日 09:02

When working with Q-Learning, what is the difference between having a Q_0(a) with all values zero, random or optimistic?

Topic: q-learning reinforcement-learning

Category: Data Science

How to create transition probability (state) for q-learning algorithm designed to control traffic light system using python?

Tenzin Dayoe

2021年10月6日 12:49

I am trying to create a q learning algorithm to control traffic light systems. I am representing the state with a matrix : state = [[no. of cars on up, no. of cars down],[no. of cars on left, no. of cars right]] but its stochastic since after allowing cars to move through one road , there are probability that cars would enter as well. I wrote the probability as follow every 4 seconds prob that 0 cars enter on one …

Topic: q-learning reinforcement-learning python statistics machine-learning

Category: Data Science

Deep Q-learning

zoraiz ali

2021年9月6日 09:07

I am working on the DDQN algorithm which is given in the following paper. I am facing a problem with the Q value. The author calculate Q value by this Q(s, a; θ , α, β) = V(s; θ , β) + A(s, a; θ , α). Q value is divided into two parts: the state–action value and action-advantage value. The action-advantage value is independent of state and environment noise, which is a relative action–value in each state relative to …

Topic: q-learning reinforcement-learning machine-learning

Category: Data Science

Q value is estimated under state V value and action A value for DDQN

zoraiz ali

2021年9月2日 17:03

How Q value is estimated under state V value and action A value. Given the below DDQN algorithm, the deep network is divided into two parts on the end layer, including state value function V(s) which represents the reward value of the state, while the action advantage function A(a) means the extra reward value of choosing an action. DDQN Algorithm Input: observation information obst = [St, At−1], Q-network and its parameters θ, target Qˆ − network and its parameters θ …

Topic: data-science-model q-learning reinforcement-learning deep-learning

Category: Data Science

Understanding DQN Algorithm

Hans Mustermann

2021年8月14日 01:20

Im studying the deep q learning algorithm. You can see it in the picture here: DQN I have a few questions about the deep q learning algorithm. What do they mean with row 14: If D_i = 0, set Y_i = ... They want me to take an action a' which maximizes the function Q which means i have to insert every action a in that state. If i have a1 and a2 I have to insert a1 and then …

Topic: q-learning reinforcement-learning neural-network machine-learning

Category: Data Science

Simple Q-learning neural network using numpy

sapal6

2021年7月18日 13:41

import numpy as np from numpy import exp, array, random, dot R = np.matrix([[-1, -1, -1, -1,1, -1], # for correct action the reward is 1 and for wrong action it's -1 [-1, -1, -1, 1, -1, 1], [-1, -1, -1, 1, -1, -1], [-1, 1, 1, -1, 1, -1], [-1, 1, 1, -1, -1, 1], [-1, 1, -1, -1, 1, 1]]) Q = np.matrix(np.zeros([6, 6])) # Q matrix gamma = 0.99 # Gamma (learning parameter). lr = 0.1 # …

Topic: q-learning implementation neural-network

Category: Data Science

Am I using this neural network in a wrong way?

Daviiid

2021年7月9日 14:49

I'm trying to solve a RL problem; the Contextual Bandit problem using Deep Q Learning. My data is all simulated. I have this environment: class Environment(): def __init__(self): self._observation = np.zeros((3,)) def interact(self, action): self._observation = np.zeros((3,)) c1, c2, c3 = np.random.randint(0, 90, 3) self._observation[0]=c1 self._observation[1]=c2 self._observation[2]=c3 reward = -1.0 condition = False if (c1<30) and (c2<30) and (c3<30) and action==0: condition = True elif (30<=c1<60) and (30<=c2<60) and (30<=c3<60) and action==1: condition = True elif (60<=c1<90) and (60<=c2<90) and …

Topic: dqn q-learning keras reinforcement-learning deep-learning

Category: Data Science

How to construct Q-table for complex, large and dynamic spaces in python?

satya

2021年7月3日 08:36

I am trying to construct a Q-table. I have state space and action space. State space consists of large number of complex and dynamic number of elements, but discrete. Theoretically, I understood everything about Q-table. I can also construct Q-table if state and action spaces are integers. But I am unable to implement for state and action spaces if they are complex in nature. Complex here refers to the complexity of representation of state and action information opposed to integer …

Topic: q-learning python

Category: Data Science

About