dynamic-programming

Structured policies in dynamic programming: solving a toy example

learningowl

2022年4月22日 20:15

I am trying to solve a dynamic programming toy example. Here is the prompt: imagine you arrive in a new city for $N$ days and every night need to pick a restaurant to get dinner at. The qualities of the restaurants are iid according to distribution $F$ (assume [0,1]). The goal is to maximize the sum of the qualities of the restaurants that you get dinner at over the $N$ days. Every day you need to choose whether you go …

Topic: dynamic-programming mathematics optimization

Category: Data Science

Coding a Content Addressable Memory on a GPU

Izar Urdin

2022年4月14日 12:03

I´m trying to code a CAM or more simply a dictionary storing the pointer of the data accessible by a key. I try to do it with a GPU but all attempts have been inefficient compared on using System.Collections.Generic.Dictionary. Does anybody know how to implement this with CUDA to obtain a better performance than with a CPU?

Topic: dynamic-programming gpu clustering

Category: Data Science

Bellman operator and contraction property

elsa

2021年4月25日 18:09

Currently, I am learning about Bellman Operator in Dynamic Programming and Reinforcement Learning. I would like to know why is Bellman operator contraction with respect to infinity norm? Why not another norm e.g. Euclidian norm?

Topic: dynamic-programming reinforcement-learning

Category: Data Science

Idenitity between TD(0) algorithm and Policy Evaluation in Dynamic Programming when alpha is equal to 1

Tommaso Bendinelli

2021年3月12日 21:03

TD(0) algorithm is defined as the iterative update of the following: $$ V(s) \leftarrow V(s) + \alpha({r + \gamma V(s')} - V(s) ) $$ Now, if we assume alpha to be equal to 1, we get the traditional Policy Evaluation formula in Dynamic programming. Is it correct?

Topic: dynamic-programming reinforcement-learning

Category: Data Science

Confusion about the Bellman Equation

datatech

2021年2月17日 20:05

In some resources, the belman equation is shown as below: $v_{\pi}(s) = \sum\limits_{a}\pi(a|s)\sum\limits_{s',r}p(s',r|s,a)\big[r+\gamma v_{\pi}(s')\big] $ The thing that I confused is that, the $\pi$ and $p$ parts at the right hand side. Since the probability part - $p(s',r|s,a)$- means that the probability of being at next state ($s'$), and since being at next state ($s'$) has to be done via following a specific action, the $p$ part also includes the probability of taking the specific actions inside it. But then, …

Topic: dynamic-programming reinforcement-learning machine-learning

Category: Data Science

Reducing the training time of an RL agent

cvg

2019年9月19日 10:06

I am trying to develop an rl agent using DQN algorithm.During training, the agent interacts with environment which is a simulated one.Each episode takes around 10 mins to run. This way if want my agent to train for some 1000000(to achieve convergence) episodes, its becoming computationally infeasible, Is there a way anyone is aware to speed up my training process, like using parallel threading or using cuda. Or is it something because of the algorithm? my episode here basically is …

Topic: dynamic-programming policy-gradients dqn q-learning reinforcement-learning

Category: Data Science

About the time differences in the Bellman equation

Ufuk Can Bicici

2019年6月2日 19:36

I am trying to grasp fundamental mathematics behind the Reinforcement Learning and so far I have unterstood how the Value Iteration and Policy algorithms do converge (contractions, etc.) I have still some problems about understanding the Bellman Equality. The Value function for a state $s$ under a policy $\pi$ is the expected discounted cumulative reward: $$ V^\pi(s_0=s) = \mathbb{E}\left[\sum_{t=0}^{\infty}\gamma^t R(s_t,\pi(s_t)) |s_0=s\right]$$. During the derivation of the Bellman Equations, when the expected cumulative rewards are calculated on an infinite horizon, meaning …

Topic: dynamic-programming reinforcement-learning machine-learning

Category: Data Science

What is the difference between dynamic programming and Q-learning?

user10296606

2019年2月15日 15:08

What is the difference between the DP-based algorithm and Q-learning?

Topic: dynamic-programming q-learning reinforcement-learning

Category: Data Science