gradient

How does the margin constant (alpha) in the triplet loss affect the training process when it is a constant?

Baraa

2022年6月3日 06:58

How does the margin constant in the triplet loss formula affect the gradient calculation when its derivative will be zero?

Topic: gradient loss-function deep-learning

Category: Data Science

Gradient and loss calculation localization in Vision Transformers

minos3579

2022年5月29日 16:49

Hi all I am resorting to you to figure out where the gradient and the loss for q,k,v weights update happens in Vision Transformers. I suspect it is the MLP/FF bit of the architecture but I am not confidently sure. I attach some code from lucidrains import torch from torch import nn from einops import rearrange, repeat from einops.layers.torch import Rearrange # helpers def pair(t): return t if isinstance(t, tuple) else (t, t) # classes class PreNorm(nn.Module): def __init__(self, dim, …

Topic: loss transformer gradient

Category: Data Science

Backpropagation in NN

Prajwal

2022年4月29日 21:31

During backward pass, which gradients are kept and which gradients are discarded? Why are some gradients discarded? I know that forward pass is computing the output of the network given the inputs and computing the loss. Backward pass is computing the gradients for each weight loss.

Topic: gradient backpropagation deep-learning neural-network

Category: Data Science

How to choose appropriate epsilon value while approximating gradients to check training?

Dávid Tóth

2022年4月24日 20:50

While approximating gradients, using actual epsilon to shift the weights results in wildly big gradient approximations, as the "width" of the used approximation triangle is disporportionately small. In Andrew NG-s course, he is using 0.01, but I suppose it's for example purposes only. This makes me wonder, is there a method to chose the appropriate epsilon value for gradient approximation based on e.g. the current error value of the network?

Topic: gradient

Category: Data Science

Differentiable approximation for counting negative values in array

iRestMyCaseYourHonor

2022年4月12日 19:01

I have an array of time of arrivals and I want to convert it to count data using pytorch in a differentiable way. Example arrival times: arrival_times = [2.1, 2.9, 5.1] and let's say the total range is 6 seconds. What I want to have is: counts = [0, 0, 2, 2, 2, 3] For this task, a non-differentiable way works perfect: x = [1, 2, 3, 4,5,6] counts = torch.sum(torch.Tensor(arrival_times)[:, None] < torch.Tensor(x), dim=0) It turns out the < …

Topic: gradient pytorch

Category: Data Science

CNN gradients with different magnitude

aretor

2022年4月11日 20:00

I have a CNN architecture with two cross entropy losses $\mathcal{L}_1$ and $\mathcal{L}_2$ summed in the total loss $\mathcal{L} = \mathcal{L}_1 + \mathcal{L}_2$. The task I want to solve is Unsupervised Domain Adaptation. I have attested the following behavior: The gradients coming from $\mathcal{L}_1$ have a different magnitude than those coming from $\mathcal{L}_2$ such that the supervision coming from the first loss is negligible. $\mathcal{L}_1$ has a positive constant value and does not decrease during the training, while $\mathcal{L}_2$ does …

Topic: gradient cnn loss-function machine-learning

Category: Data Science

Question about grad() from Deep Learning by Chollet

alpastor

2022年4月10日 23:44

On page 58 of the second edition of Deep Learning with Python, Chollet is illustrating an example of a forward and backward pass of a computation graph. The computation graph is given by: $$ x\to w\cdot x := x_1 \to b + x_1 := x_2 \to \text{loss}:=|y_\text{true}-x_2|. $$ We are given that $x=2$, $w=3$, $b=1$, $y_{\text{true}}=4$. When running the backward pass, he calculates $$ grad(\text{loss},x_2) = grad(|4-x_2|,x_2) = 1. $$ Why is the following not true: $$ grad(\text{loss},x_2) = \begin{cases} …

Topic: gradient backpropagation

Category: Data Science

How to interpret integrated gradients in an NLP toxic text classification use-case?

Revolucion for Monica

2022年4月9日 14:30

I am trying to understand how integrated gradients work in the NLP case. Let $F: \mathbb{R}^{n} \rightarrow[0,1]$ a function representing a neural network, $x \in \mathbb{R}^{n}$ an input and $x' \in \mathbb{R}^{n}$ a reference. We consider the segment connecting $x$ to $x'$, and we compute the gradient at any point of this segment. The IG method is simply to sum these gradients. Thus, $I G$ in the ith dimension is given by the following formula: $$ I G_{i}(x)=\left(x_{i}-x'_{i}\right) \frac{\int_{\alpha=0}^{1} d …

Topic: explainable-ai gradient gradient-descent neural-network nlp

Category: Data Science

Is there a difference between AutoGrad and explicit derivatives (gradient)?

Rubyat

2022年4月4日 13:03

Will there be some differences between applying AutoGrad on the loss function (using a python library) and applying explicit gradient (the gradient from the paper or the update rule)? For example: numerical, runtime, mathematical, or stability differences.

Topic: gradient backpropagation gradient-descent deep-learning machine-learning

Category: Data Science

How to manually calculate the gradient that will propagate back over the network using the REINFORCE algorithm?

Angelo Antonio Manzatto

2022年2月28日 10:57

I am trying to implement deep reinforcement policy gradient REINFORCE in C++ and for my case there is no "autograd" method like in pytorch so I have to manually calculate the gradient. Let´s imaging that I have a scenario where the state space size is 4 and action space size is 2 (Cartpole). Also I collected the followind data for 3 steps: action probability (softmax): [0.21, 0.34, 0.45], [0.91, 0.01, 0.08], [0.50, 0.30, 0.20] sampled action (one hot encoder) : …

Topic: gradient learning backpropagation

Category: Data Science

Intuitive explanation for representing gradient in higher dimensions

forhayley

2022年2月27日 10:05

I do not understand how complex networks with many parameters/dimensions can be represented in a 3D space, and form a standard cost surface just like a simple network with, say, 2 parameters. For example, a network with 2 parameters that correspond to the X and Y axis, respectively, and cost function that corresponds to the Z axis makes sense...but how can we have a network with 1000 dimensions being represented in a 3D space, on a planar cost surface (not …

Topic: gradient gradient-descent

Category: Data Science

Analytical gradients from tf.gradients don't match approximate gradients

instax

2022年2月8日 22:49

I have a trained neural network (NN) with independent inputs x1, x2.. xn and a scalar output y. Input x1 is a scalar, and tf.gradients(y, x1) returns a negative value. However, calculating approximate gradients via $\frac{NN(x1 + \Delta) - NN(x1-\Delta)}{2\Delta}$ where $\Delta > 0$ yields a positive value. The following is a visualization of my problem. In blue are y = NN(inputs) for all inputs seen as training data plotted against x1. Judging by these points, it is reasonable to …

Topic: gradient tensorflow optimization machine-learning

Category: Data Science

Gradient passthough in PyTorch

user3023715

2022年2月3日 17:04

I need to quantize the inputs, but the method (bucketize) I need to do so is indifferentiable. I can of course detach the tensor, but then I lose the flow of gradients to earlier weights. I guess the question is quite simple, how do you continue the flow of gradients when necessary. For example, using the following code ... x = self.linear1(x) min, max = int(x.min()), int(x.max()) bins = torch.linspace(min, max+1, 16) x = torch.bucketize(x.detach(), bins) # forced to detach …

Topic: gradient pytorch backpropagation deep-learning neural-network

Category: Data Science

Propagating -infs in pytorch and outliers in general

Lucidnonsense

2022年1月30日 17:01

I am using a loss which requires sampling from probability distributions to do monte carlo integration with. Sometimes legitimate training data can throw -inf/NaN. This is intended behaviour since the data point maybe far enough from the model that the probability is too small for float32. Needless to say switching to float64 etc is not a solution. The problem is that -inf turns into nan when calculating the gradient in using logsumexp, sinh, and MultivariateNormal.logpdf which then propagates all the …

Topic: gradient pytorch outlier

Category: Data Science

Gradients are becoming None in PyTorch

SS Varshini

2021年11月3日 02:22

#AM is the autograd function for argmax with backpropagation x,preds = model(id, mask) print(preds.retain_grad()) print(AM.apply(preds)) # compute the loss between actual and predicted values loss = torch.mean(1/(sample_score(x,AM.apply(preds))+1)) print(loss) #loss.requires_grad=Truecse # add on to the total loss x.retain_grad() preds.retain_grad() print(x.requires_grad) print(preds.requires_grad) loss.backward() print(model.fc2.weight.grad) total_loss = total_loss + loss.item() Here gradients are becoming None. How can I solve this?

Topic: gradient pytorch

Category: Data Science

Why is the signed gradient of the image used for adversarial examples

Moritz Groß

2021年10月30日 22:00

In this paper, the gradient of the loss w.r.t. to the image is computed, but its sign is used. Why is using the sign-method better?

Topic: adversarial-ml gradient deep-learning neural-network

Category: Data Science

Central finite distance gradient simplified

Anđela Todorović

2021年10月4日 23:09

I'm asked to compute central finite difference scheme (f(i+1)-f(i-1)) on an image. My attempt is something like: def gradient_x_diff(img): img = img.astype(float) return np.fabs(np.roll(imgf,1, axis = 0) - imgf(np.roll(imgf,1, axis = 0)) However, it's hinted that the solution is straightforward. It should be something like this: def gradient_x_diff(img): img = img.astype(float) return img[*]-img[**] What should I parse instead of the * and ** ?

Topic: image-preprocessing gradient

Category: Data Science

Which Neural Network or Gradient Boosting framework is the simplest for Custom Loss Functions?

Borut Flis

2021年7月22日 08:09

I need to implement a custom loss function. The function is relatively simple: $$-\sum \limits_{i=1}^m [O_{1,i} \cdot y_i-1] \ \cdot \ \operatorname{ReLu}(O_{1,i} \cdot \hat{y_i} - 1)$$ With $O$ being some external attribute specific to each case. I was initially working with LightGBM, but I only found tutorials that included calculating the hessian and the gradient. If there is a way to add the function without this please correct me. Otherwise I am open to using other libraries. PyTorch-Fastai, Tensorflow-keras, catboost, …

Topic: lightgbm gradient loss-function neural-network python

Category: Data Science

How batch normalization layer resolve the vanishing gradient problem?

user3668129

2021年6月4日 22:13

According to this article: https://towardsdatascience.com/the-vanishing-gradient-problem-69bf08b15484 The vanishing gradient problem occurs when using the sigmoid activation function because sigmoid maps large input space into small space, so the gradient of big values will be close to zero. The article suggests using batch normalization layer. I can't understand how it can works? When using normalization, big values still get big values in another scope (instead of [-inf, inf] they will get [0..1] or [-1..1]) , so in the same cases the values …

Topic: gradient activation-function batch-normalization backpropagation deep-learning

Category: Data Science

Vanishing Gradient vs Exploding Gradient as Activation function?

vipin bansal

2021年3月22日 04:42

ReLU is used as an activation function that serves two purposes: Breaking linearity in DNN. Helping in handling Vanishing Gradient problem. For Exploding Gradient problem, we use Gradient Clipping approach where we set the max threshold limit of Gradient, similarly to ReLU which set the minimum Gradient limit which is 0. So far I read, ReLU is considered as an activation function. In a similar fashion, can we use Gradient Clipping also as an activation function? If yes, any pros …

Topic: gradient activation-function

Category: Data Science

About