DQN fails to find optimal policy

Based on DeepMind publication, I've recreated the environment and I am trying to make the DQN find and converge to an optimal policy. The task of an agent is to learn how to sustainably collect apples (objects), with the regrowth of the apples depending on its spatial configuration (the more apples around, the higher the regrowth). So in short: the agent has to find how to collect as many apples as he can (for collecting an apple he gets a …
Category: Data Science

What are desirable properties of layers in Deep Learning?

I have been thinking about the following the problem: Given some task we assume there is a magical function that perfectly solve this task. For example, if we want to distinguish cats and dogs, then we can train neural network that hopefully converges over time to a function "similar" to our magical function. The problem is now: How can we help/encourage our network to converge to a good/better function? In theory a single layer + a non-linearity can be enough, …
Category: Data Science

Validation loss keeps fluctuating about training loss

I am training a Keras model for multi-target regression by using a custom loss function with the goal of getting predictions accurate to below 0.01 with respect to that loss function. As can be seen from the below plot of the loss functions, both the training and validation loss quickly get below the target value and the training loss seems to converge rather quickly while the validation loss keeps fluctuating about the training loss value. Although the loss is below …
Category: Data Science

Is learning_rate linear with the time to converge using AdamOpt?

Say that both learning rates 1e-3,1e-4 leading to the same solution (not too high or too small). In terms of convergence by the amount of epochs, does optim.Adam(model.parameters(), lr=1e-3) compare to optim.Adam(model.parameters(), lr=1e-4) will take 10 time more epoch? So if an optimizer with lr=1e-3 reached the solution at epoch 130, theoretically, an optimizer with lr=1e-4 will get there at epoch 1300? I think that my statement is true in a vanilla SGD, but in Adam's opt there's both momentum …
Category: Data Science

Convergence of Sarsa($\lambda$)

Is there any theorem on the convergence of the Sarsa($\lambda$) Algorithm? I am currently working through the theory of Reinforcement Learning with the lecture by David Silver and the book of Sutton & Barto. I could not find an answer to my question. I found comments and theorems on the convergence of TD($\lambda$) and one-step Sarsa, but nothing for Sarsa($\lambda$). Same result after hitting Google for an hour. Does the convergence follows naturally as Sarsa($\lambda$) is basically the equivalent of …
Category: Data Science

RL agent behave differently for different data

I am training an RL model using PPO for AAPL stock. There are 3 actions to take, Buy, Sell or Hold. If there is a Buy(/Sell) signal, the environment will buy(/sell) all. To trade for each year, the model learns to trade by using the last 5 years data (it randomly select a year out of these 5 years to train). In the process, I have accidentally put future data in the state and the model for 2010 learnt that …
Category: Data Science

ElasticNet Convergence odd behavior

I am optimizing a model using ElasticNet, but am getting some odd behavior. When I set the tolerance hyperparameter with a small value, I get "ConvergenceWarning: Objective did not converge" errors. So I tried a larger tolerance value, and the convergence error goes away, but now the test data consistently gives a higher root mean squared error value. This seems backwards to me, if the model does not converge, what can cause it to give a better RMSE score, or …
Category: Data Science

Normalizing the final weights vector in the upper bound on the Perceptron's convergence

The convergence of the "simple" perceptron says that: $$k\leqslant \left ( \frac{R\left \| \bar{\theta} \right \|}{\gamma } \right )^{2}$$ where $k$ is the number of iterations (in which the weights are updated), $R$ is the maximum distance of a sample from the origin, $\bar{\theta}$ is the final weights vector, and $\gamma$ is the smallest distance from $\bar{\theta}$ to a sample (= the margin of hyperplane). Many books implicitly say that $\left \| \bar{\theta} \right \|$ is equal to 1. But …
Category: Data Science

What exactly is convergence rate referring to in machine learning?

My understanding of the term "Convergence Rate" is as follows: Rate at which maximum/Minimum of a function is reached, so in logistic regression rate at which gradient decent reaches global minimum. So by convergence rate I am guessing it is measure of: time measured from start of gradient descent until it reaches global maximum. average number of distance our model went downhill(Do not know technical term...) for each iteration. Can someone verify whether or not one of my guess is …
Category: Data Science

Number of epochs in Gensim Word2Vec implementation

There's an iter parameter in the gensim Word2Vec implementation class gensim.models.word2vec.Word2Vec(sentences=None, size=100, alpha=0.025, window=5, min_count=5, max_vocab_size=None, sample=0, seed=1, workers=1, min_alpha=0.0001, sg=1, hs=1, negative=0, cbow_mean=0, hashfxn=<built-in function hash>, **iter=1**, null_word=0, trim_rule=None, sorted_vocab=1) that specifies the number of epochs, i.e.: iter = number of iterations (epochs) over the corpus. Does anyone know whether that helps in improving the model over the corpus? Is there any reason why the iter is set to 1 by default? Is there not much effect in increasing …
Category: Data Science

What is going on with this kind of validation loss graph?

I am using stock prices and a whole bunch of indicators values to try to get a tensorflow model to predict to buy,sell, or hold. I think im going about this right but when i train the model, first i set a learning rate scheduler to increase the learning rate until the model converges and i use the training rate from the graph where the train loss and val loss first make their steeppest slope down for the next training …
Category: Data Science

Rate of convergence - comparison of supervised ML methods

I am working on a project with sparse labelled datasets, and am looking for references regarding the rate of convergence of different supervised ML techniques with respect to dataset size. I know that in general boosting algorithms, and other models that can be found in Scikit-learn like SVM's, converge faster than neural networks. However, I cannot find any academic papers that explore, empirically or theoretically, the difference in how much data different methods need before they reach n% accuracy. I …
Category: Data Science

Does convergence equal learning in Deep Q-learning?

In my current research project I'm using the Deep Q-learning algorithm. The setup is as follows: I'm training the model (using Deep Q-learning) on a static dataset made up of experiences extracted from N levels of a given game. Then, I want to use the trained model to solve M new levels of the same game, i.e., I want to test the generalization ability of the agent on new levels of the same game. Currently, I have managed to find …
Category: Data Science

Help with MLP convergence

I posted this question on AI SE and got advised to ask here for guidance. I've been stuck for a couple of days trying to figure it out how the standard MLP works and why my code doesn't converge at all to solve XOR (doesn't breaks as well and produce some numbers). To make things short and straightforward (you can get more details in the above link), I'm stuck at coding backpropagation with a simple architecture ($1$ hidden layer) in …
Category: Data Science

Do smaller neural nets always converge faster than larger ones?

In your experience, do smaller CNN models (fewer params) converge faster than larger models? I would think yes, naturally, because there are fewer parameters to optimize. However, I am training a a custom MobileNetV2-based Unet (with 2.9k parameters) for image segmentation, which is taking longer to converge than a model with greater number of parameters (5k params). If this convergence behavior is unexpected, it probably indicates a bug in the architecture
Category: Data Science

Force Matching in Coarse Grained Molecular Dynamics with Jax - Forces do not match when neglecting energy loss

I am currently exploring force matching approaches for molecular dynamic simulations. As I am still in an exploration state, I'd tried investigated Force Matching Neural Network Colab Notebook corresponding to Unveiling the predictive power of static structure in glassy systems. They train a graph neural network to match to estimate forces from positions. Therefore, they calculate a loss where they match energy and forces. Loss = $(energy_{predicted} - energy_{target})^2 + ( Forces_{predicted} - Forces_{target})^2$ where the Energy is defined as …
Category: Data Science

Uniform convergence garantee on sample complexity

I can't understand why the Uniform Convergence guarantees an upper bound and not a lower bound on sample complexity as stated on [1] Corollary 4.4. If a class $H$ has the uniform convergence property with a function $m^{UC}_H$ then the class is agnostically PAC learnable with the sample complexity $$m_H(\epsilon ,\delta) \leq m^{UC}_H(\epsilon/2,\delta)$$ Furthermore, in that case, the $ERM_H$ paradigm is a successful agnostic PAC learner for $H$. From what I understood, if we have the sample set $S$ with …
Category: Data Science

Logistic regression does cannot converge without poor model performance

I have a multi-class classification logistic regression model. Using a very basic sklearn pipeline I am taking in cleansed text descriptions of an object and classifying said object into a category. logreg = Pipeline([('vect', CountVectorizer()), ('tfidf', TfidfTransformer()), ('clf', LogisticRegression(n_jobs=1, C=cVal)), ]) Initially I began with a regularisation strength of C = 1e5 and achieved 78% accuracy on my test set and nearly 100% accuracy in my training set (not sure if this is common or not). However, even though the …
Category: Data Science

How do i iterate functions until convergence in R?

I am looking to iterate until convergence but I am not sure how i should code it. A simple similar example is something like $X_0=5 , Y_0=3$ for $n=1,2..$ $P_n=X_{n-1}-Y_{n-1}$ $Y_n=3P_n$ $X_n=P_n+Y_n$ and end when it converges. I have seen that for the repeat loop it loops indefinitely with an exit condition but I am unsure how i can word the iterations within R and how to word the exit condition for convergence. Any references or help would be appreciated.
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.