deepmind

DQN fails to find optimal policy

macwiatrak

2022年5月14日 23:04

Based on DeepMind publication, I've recreated the environment and I am trying to make the DQN find and converge to an optimal policy. The task of an agent is to learn how to sustainably collect apples (objects), with the regrowth of the apples depending on its spatial configuration (the more apples around, the higher the regrowth). So in short: the agent has to find how to collect as many apples as he can (for collecting an apple he gets a …

Topic: deepmind dqn convergence q-learning reinforcement-learning

Category: Data Science

Unexpected keyword argument error in tensorflow-agents replay buffers

Della

2022年5月4日 12:00

Following the tensorflow tutorial on deep reinforcement learning and DQN. Even after setting up the exact same libraries and running the same code, I am getting some error. from tf_agents.replay_buffers import reverb_utils .... rb_observer = reverb_utils.ReverbAddTrajectoryObserver( replay_buffer.py_client, table_name, sequence_length=2) # This line is throwing error This is the stacktrace TypeError Traceback (most recent call last) Input In [7], in <cell line: 23>() 15 reverb_server = reverb.Server([table]) 17 replay_buffer = reverb_replay_buffer.ReverbReplayBuffer( 18 agent.collect_data_spec, 19 table_name=table_name, 20 sequence_length=2, 21 local_server=reverb_server) ---> 23 …

Topic: deepmind dqn reinforcement-learning

Category: Data Science

How to train a policy and a value network, implementing alphazero at chess

Gerasimos Delivorias

2022年3月18日 08:44

So, I'm trying to implement alphazero's logic on the game of chess. What I understand so far of the algorithm is: Load 2 models, one of which is the best model you have so far. Both these models have a value network and a policy network and use MCTS to find the best move. Play n games between these 2 models and save the states, moves and who won each game Train the new model on a sample of the …

Topic: deepmind reinforcement-learning deep-learning

Category: Data Science

Is the "training loop" used in AlphaGo Zero the same as an "epoch"?

ihavenoidea

2022年3月5日 11:00

I am confused about the training stage of AlphaGo Zero using the data collected from the selfplay stage. According to an AlphaGo Zero Cheat Sheet I found, the training routine is: Loop from 1 to 1,000: Sample a mini-batch of 2048 episodes from the last 500,000 games Use this mini-batch as input for training (minimize their loss function) After this loop, compare the current network (after the training) with the old one (prior the training) However, after reading the article, …

Topic: deepmind training keras tensorflow deep-learning

Category: Data Science

Question on embedding similarity / nearest neighbor methods [SCANN Paper]

Aditya

2022年1月13日 10:36

Question on embedding similarity / nearest neighbor methods: In https://arxiv.org/abs/2112.04426 the DeepMind team writes: For a database of T elements, we can query the approximate nearest neighbors in O(log(T)) time. We use the SCaNN library [https://ai.googleblog.com/2020/07/announcing-scann-efficient-vector.html] Could someone provide an intuitive explanation for this time complexity of ANN? Thanks! A very Happy New Year Earthling's!

Topic: deepmind word-embeddings deep-learning

Category: Data Science

Which AI algorithm is best for chess?

Jenia

2021年12月3日 00:58

I'm working on my chess bot, and I would like to implement simple artificial intelligence for it. I'm new in it, so I'm unsure how to do it specifically on chess. I heard about Q-learning, Supervised/Unsupervised learning, Genetic algorithm, etc., which probably is not for chess. I wondered how AlphaZero was created? Probably Genetic algorithm, but chess is the game where "if A then B" might not work. It means that Q-learning is also bad for it, and so on. …

Topic: explainable-ai deepmind game deep-learning machine-learning

Category: Data Science

Why is stop-gradient used in Deep Mind's BYOL (Bootstrap Your Own Latent)?

Lafayette

2021年8月24日 09:47

I'm reading Grill's et al. paper regarding their self-supervised approach. I do not understand why the output of the target network is indicated as sg(z'ξ), rather then just (z'ξ), as would seem to be indicated from the loss equations? Is sg used simply for the sake of signifying that the results of this network do not impact it parameters (ξ)? Because that would seem redundant to how ξ is defined in the paper (as a weighted moving average of θ). …

Topic: deepmind unsupervised-learning computer-vision deep-learning

Category: Data Science

Any research on relationship between the dimensions of a (word2Vec) space and how the human mind constructs meaning (or reality) through language?

Raul Alvarez

2021年5月1日 10:16

Neuroscience is still trying to "find" how the mind (and language) somehow "works". Is there any theory linking a (low-dimensionality) embedding space (like word2Vec) to a mind (linguistic) model? Any Cognitive Linguistics theory?

Topic: deepmind word2vec nlp

Category: Data Science

Which Policy Gradient Method was used by Google's Deep Mind to teach AI to walk

learner

2021年4月10日 12:58

I just saw this video on Youtube. Which Policy Gradient method was used to train the AI to walk? Was it DDPG or D4PG or what?

Topic: deepmind policy-gradients reinforcement-learning deep-learning machine-learning

Category: Data Science

On what principle did Google's DeepMind learn to walk?

learner

2021年4月9日 15:54

I just saw this video on Youtube. On what principle did Google's DeepMind learn to walk? Was it Q-Learning or a Genetic Algorithm or Policy Gradient?

Topic: deepmind q-learning genetic-algorithms deep-learning machine-learning

Category: Data Science

What does scaling a gradient do?

Pro Q

2020年1月28日 12:51

In the MuZero paper pseudocode, they have the following line of code: hidden_state = tf.scale_gradient(hidden_state, 0.5) What does this do? Why is it there? I've searched for tf.scale_gradient and it doesn't exist in tensorflow. And, unlike scalar_loss, they don't seem to have defined it in their own code. For context, here's the entire function: def update_weights(optimizer: tf.train.Optimizer, network: Network, batch, weight_decay: float): loss = 0 for image, actions, targets in batch: # Initial step, from the real observation. value, reward, …

Topic: deepmind ai machine-learning-model machine-learning

Category: Data Science

AlphaGo Zero loss function

ihavenoidea

2019年11月19日 08:32

As far as I understood from the AlphaGo Zero system: During the self-play part, the MCTS algorithm stores a tuple ($s$, $\pi$, $z$) where $s$ is the state, $\pi$ is the distribution probability over the actions in the state and $z$ is an integer representing the winner of the game that state is in. The network will receive $s$ as input (a stack of matrices describing the state $s$) and will output two values: $p$ and $v$. $p$ is a …

Topic: deepmind keras tensorflow loss-function deep-learning

Category: Data Science

temperature variable in boltzmmann-exploration in reinforcement learning

cvg

2019年9月26日 08:01

I have been using epsilon greedy action selection strategy and recently have come across boltzmann(softmax) action selection strategy. One thing I am not clear about boltzmann exploration is the temperature variable. How should we define this variable. Is this a constant variable or should be decreased over the period of training. and how to decide on the absolute value of this parameter? Thanks

Topic: deepmind softmax ai reinforcement-learning deep-learning

Category: Data Science

Game theory in Reinforcement Learning

Karthik Rajkumar

2019年3月25日 13:44

In one of the recent blog post by Deepmind, they have used game theory in Alpha Star algorithm. Deep Mind Alpha-Star: Mastering this problem requires breakthroughs in several AI research challenges including: Game theory: StarCraft is a game where, just like rock-paper-scissors, there is no single best strategy. As such, an AI training process needs to continually explore and expand the frontiers of strategic knowledge. Where the game theory is applied when it comes to reinforcement learning?

Topic: deepmind reinforcement-learning deep-learning

Category: Data Science

Deep Reinforcement Learning for dynamic pricing

Karthik Rajkumar

2019年3月15日 12:55

I am trying to implement a Deep Q Network model for Dynamic pricing in Logistics. I can define State Space (Origin, Destination, type of the shipment, customer, Type of the product, Commodity of the shipment, AVAILABILITY of capacity etc. Action Space (price itself, can range from 0 to inf) we need to determine the price itself. Reward Signal (Rewards can be based on a similar offer to other customers, seasonality, remaining capacity. I am planning to use Multi-Layer Perceptron for …

Topic: deepmind dqn tensorflow reinforcement-learning deep-learning

Category: Data Science

Deepmind conditional neural process: evaluation

Shadi

2019年1月25日 16:00

Going through the Deepmind jupyter notebook conditional neural processes, the plots at the bottom of the notebook show that the ground truth and the predicted distribution only overlap around the "context points". These context points are already in the training set. This comes as a surprise to me because I was expecting that if the model worked, then the ground truth curve would lie inside the predicted distribution at non-context points. So, doesn't this mean that the network failed to …

Topic: deepmind gaussian

Category: Data Science

About