gradient-descent

gradient descent diverges extremely

user94586

2022年6月1日 14:04

I have manually created a random data set around some mean value and I have tried to use gradient descent linear regression to predict this simple mean value. I have done exactly like in the manual and for some reason my predictor coefficients are going to infinity, even though it worked for another case. Why, in this case, can it not predict a simple 1.4 value? clear all; n=10000; t=1.4; sigma_R = t*0.001; min_value_t = t-sigma_R; max_value_t = t+sigma_R; y_data …

Topic: matlab gradient-descent predictive-modeling algorithms machine-learning

Category: Data Science

Understanding Learning Rate in depth

noooah

2022年5月29日 09:33

I am trying to understand why the learning rate does not work universally. I have two different data sets and have tested out three learning rates 0.001 ,0.01 and 0.1 . For the first data set, I was able to achieve results for all learning rates at optimization using stochastic gradient descent. For the second data set the learning rate 0.1 did not converge. I understand the logic behind it overshooting the gradients, however, I'm failing to understand why this …

Topic: sgd gradient-descent deep-learning optimization machine-learning

Category: Data Science

Why is each successive tree in GBM fit on the negative gradient of the loss function?

GeorgeOfTheRF

2022年5月27日 16:04

Page 359 of Elements Of Statistical Learning 2nd edition says the below. Can someone explain the intuition & simplify it in layman terms? Questions What is the reason/intuition & math behind fitting each successive tree in GBM on the negative gradient of the loss function? Is it done to make GBM more generalization on unseen test dataset? If so how does fitting on negative gradient achieve this generalization on test data?

Topic: loss-function gbm gradient-descent optimization machine-learning

Category: Data Science

Understanding Gradient Descent

Apoorva

2022年5月26日 18:14

In Gradient Descent, we start with a random value of the co-efficient & improve upon it with each iteration. Do we move on to the next co-efficient after finding the first one, or do we work on all the coefficients simultaneously?

Topic: parameter-estimation gradient-descent

Category: Data Science

Vanishing gradient problem even after existence of ReLu function?

Bits

2022年5月26日 13:01

Let's say I have a deep neural network with 50 hidden layers and at each neuron of hidden layer the ReLu activation function is used. My question is Is it possible for vanishing gradient problem to get occur during the backpropogation for weights updates even after the existence of relu? or we can say that vanishing gradient problem will never occur when all the activation functions are ReLu?"

Topic: cnn gradient-descent deep-learning neural-network

Category: Data Science

calculating gradient descent

imene

2022年5月26日 10:52

when using mini batch gradient descent , we perform backpropagation after each batch , ie we calculate the gradient after each batch , we also capture y-hat after each sample in the batch and finally calculate the loss function all over the batch , we use this latter to calculate the gradient, correct ? now as the chainrule states we calculate the gradient this way for the below neural network: the question is if we calculate the gradient after passing …

Topic: backpropagation loss-function gradient-descent neural-network

Category: Data Science

Is it beneficial to use a batch size > 1 even when all computing power can be used?

wispi

2022年5月25日 23:44

In regards to training a neural network, it is often said that increasing the batch size decreases the network's ability to generalize, as alluded to here. This is due to the fact that training on large batches causes the network to converge to sharp minimas, as opposed to wide ones, as explained here. This begs the question: In situations where all available computing power can be used by training on a batch size of one, is there a benefit to …

Topic: training gradient-descent neural-network optimization machine-learning

Category: Data Science

how to calculate loss function?

imene

2022年5月25日 15:26

i hope you are doing well , i want to ask a question regarding loss function in a neural network i know that the loss function is calculated for each data point in the training set , and then the backpropagation is done depending on if we are using batch gradient descent (backpropagation is done after all the data points are passed) , mini-batch gradient descent(backpropagation is done after batch) or stochastic gradient descent(backpropagation is done after each data point). …

Topic: mse loss-function gradient-descent deep-learning neural-network

Category: Data Science

'Solvers' in Machine Learning

Apoorva

2022年5月23日 09:48

What role do 'Solvers' play in optimization problems? Surprisingly, I could not find any definition for 'Solvers' online. All the sources I've referred to just explain the types of solvers & the conditions under which each one is supposed to be used. Examples of Solvers - ['Newton-cg', 'lbfgs', 'liblinear', 'sag,' 'saga']

Topic: gradient-descent optimization

Category: Data Science

Can we use decreasing step size to replace mini-batch in SGD?

coolcat

2022年5月22日 14:01

As far as I know, mini-batch can be used to reduce the variance of the gradient, but I am also considering if we can achieve the same result if we use the decreasing step size and only single sample in each iteration? Can we compare the convergence rate of them?

Topic: mini-batch-gradient-descent gradient-descent optimization machine-learning

Category: Data Science

Understanding SGD for Binary Cross-Entropy loss

Coinman

2022年5月19日 18:07

I'm trying to describe mathematically how stochastic gradient descent could be used to minimize the binary cross entropy loss. The typical description of SGD is that I can find online is: $\theta = \theta - \eta *\nabla_{\theta}J(\theta,x^{(i)},y^{(i)})$ where $\theta$ is the parameter to optimize the objective function $J$ over, and x and y come from the training set. Specifically the $(i)$ indicates that it is the i-th observation from the training set. For binary cross entropy loss, I am using …

Topic: sgd mathematics multilabel-classification gradient-descent machine-learning

Category: Data Science

Verifying my understanding of MLE & Gradient Descent in Logistic Regression

Apoorva

2022年5月18日 06:49

Here is my understanding of the relation between MLE & Gradient Descent in Logistic Regression. Please correct me if I'm wrong: 1) MLE estimates optimal parameters by taking the partial derivative of the log-likelihood function wrt. each parameter & equating it to 0. Gradient Descent just like MLE gives us the optimal parameters by taking the partial derivative of the loss function wrt. each parameter. GD also uses hyperparameters like learning rate & step size in the process of obtaining …

Topic: parameter-estimation gradient-descent logistic-regression

Category: Data Science

Why do we only care about convex functions when doing Gradient Descent/SGD?

lairv

2022年5月15日 00:02

I mean I know why we specifically care about convex functions: it's because their local minimum are also global, and so you just have to "follow a path which goes down" to find the minima of the function. However, there are also functions which are not convex, but for which local minima are also global minima, for example, a function which looks like this: Isn't there a way to characterize every function which "works well" with gradient descent? Something like …

Topic: gradient-descent

Category: Data Science

How do I deal with non-IID data in gradient boosted random forest (for stock market)?

Michael Protz

2022年5月12日 00:02

I am working on a stock market decision system. I have currently centered on gradient boosting as the likely best machine learning solution for the problem. However, I have 2 fundamental issues with my data owing to it being from the stock market having to do with it not being IID. First, because of the duration of average in some indicators use, some data-points are highly correlated. For example, the 2-year trailing return of a stock is not very different …

Topic: gradient-descent random-forest

Category: Data Science

How do you find the eigenvalues of the matrix for the following momentum gradient descent?

alexmesa

2022年5月4日 09:19

The following question is based purely on the material available on MIT's open courseware youtube channel. (https://www.youtube.com/watch?v=wrEcHhoJxjM). In it, Professor Gilbert Strang explains the general formulation of the momentum gradient descent problem and ultimately arrives at optimum values (40:05 in the video) for the variables $s$ and $\beta$. \ $\textbf{Background}$ Lets begin with the standard gradient descent not covered in this video. The equation for this is: $x_{k+1}=x{k}-s \nabla f(x_k) $ $s$ is the step size, $f(x_k)$ is the value …

Topic: momentum gradient-descent

Category: Data Science

Learning parameters when loss is a piecewise function

hsiaomijiou

2022年4月27日 15:07

I have a network to generate a single number $T$. I know in advance: a property of the loss function is that, when $T \in [a_1, a_2]$, the loss has the same value $L_1$; when $T \in [a_2, a_3]$, the loss has another value $L_2$; etc. The loss function resembles a piecewise function. A concrete, simplified example of this problem is perhaps something like object classification. I have a set of objects, and their distances to a category $C$ that …

Topic: gradient-descent machine-learning

Category: Data Science

ResNet: Derive the gradient matrices w.r.t. W1 and W2 and backprop equation in a Residual Network

Neuro

2022年4月26日 01:46

How would I go about step by step deriving stochastic gradient matrices w.r.t. W1 and W2 and backpropagation equation in a residual block that is a part of a larger ResNet network with forward propagation expressed as: $$ F(x) = \mathrm{W}_{2}^{} \mathrm{g}_{1}^{}(\mathrm{W}_{1}^{}x) $$ $$ y = \mathrm{g}_{2}^{} (F(x) + x) $$ and $$ \mathrm{g}_{1}^{}, \mathrm{g}_{2}^{} $$ are component-wise non-linear activation functions.

Topic: sgd convolutional-neural-network gradient-descent neural-network machine-learning

Category: Data Science

From what function do come the gradients that I use to adjust weights?

Igor Kolesnikov

2022年4月25日 17:02

I have a question about the loss function and the gradient. So I'm following the fastai (https://github.com/fastai/fastbook) course and at the end of 4th chapter, I got myself wondering. From what function do come the gradients that I use to adjust weights? I do understand that loss function is being derivated. But which? Can I see it? Or is it under the hood of PyTorch? Code of the step function: def step(self): self.w.data -= self.w.grad.data * self.lr self.b.data -= self.b.grad.data …

Topic: fastai pytorch loss-function gradient-descent

Category: Data Science

Difference between OLS and Gradient Descent in Linear Regression

Zero

2022年4月16日 10:11

I understand what Ordinary Least Squares and Gradient Descent do but I am just confused about the difference between them. The only difference I can think of are- Gradient Descent is iterative while OLS isn't. Gradient Descent uses a learning rate to reach the point of minima, while OLS just finds the minima of the equation using partial differentiation. Both these methods are very useful in Linear Regression but they both give us the same results: the best possible values …

Topic: linear-regression gradient-descent

Category: Data Science

Need help to understand the formula of gradient descent with multiple features

ml_noob

2022年4月12日 22:42

I am trying to implement gradient descent with multiple features after listening to Andrew Ng's Coursera lecture. gradient descent for multiple features So for example when calculating for theta 1, part of the formula requires you to subtract the real y value with the predicted y value (to calculate the error) and at the end of the formula you multiply it with the value of feature 1 in the ith training example, as denoted by the x superscript (i) subscript …

Topic: linear-regression gradient-descent machine-learning

Category: Data Science

About