derivation

Derivative of Loss wrt bias term

Gonzalo Sanchez cano

2022年4月3日 13:01

I read this and have an ambiguity. I try to understand well how to calculate the derivative of Loss w.r.t to bias. In this question, we have this definition: np.sum(dz2,axis=0,keepdims=True) Then in Casper's comment, he said that the The derivative of L (loss) w.r.t. b is the sum of the rows $$ \frac{\partial L}{\partial Z} \times \mathbf{1} = \begin{bmatrix} . &. &. \\ . &. &. \end{bmatrix} \begin{bmatrix} 1\\ 1\\ 1\\ \end{bmatrix} $$ But actually, using axis=0, is it not …

Topic: derivation bias loss-function gradient-descent

Category: Data Science

How to compute backpropagation gradient according chain rule for using vector/matrix differential?

Grigogiy Reznichenko

2022年3月11日 23:09

I have some problems for computing derivative for sum of squares error in backprop neural network. For example, we have a neural network as in picture. For drawing simplicity, i've dropped the sample indexes. Сonventions: x - data_set input. W - is a weigth matrix. v - vector of product: W*x. F - activation function vector. y - vector of activated data D - vector of answers e - error signal lower index is a variable(NxN) - dimenstionality higher [index] …

Topic: derivation matrix backpropagation neural-network machine-learning

Category: Data Science

1st order Taylor Series derivative calculation for autoregressive model

targetXING

2022年2月20日 12:01

I wrote a blog post where I calculated the Taylor Series of an autoregressive function. It is not strictly the Taylor Series, but some variant (I guess). I'm mostly concerned about whether the derivatives look okay. I noticed I made a mistake and fixed the issue. It seemed simple enough,but after finding an error, I started to doubt myself. $$f(t+1) = w_{t+1} \cdot f(t) $$ $$y^{*}_{t+1} = f(t+1)-{\frac {f'(t+1)}{1!}}(-t-1+t)$$ $$y^{*}_{t+1} = w_{t+1} f(t) + \dfrac{d}{df(t)}w_{t+1}f(t) + \dfrac{d}{dw_{t+1}}w_{t+1}f(t)$$ $$y'_{t+1} = w_{t+1} …

Topic: derivation forecasting regression predictive-modeling

Category: Data Science

Maximum Entropy Policy Gradient Derivation

Ricky Sanjaya

2022年2月1日 17:05

I am reading through the paper on Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review by Sergey Levine. I am having a difficulty in understanding this part of the derivation on Maximum Entropy Policy Gradients (Section 4.1) Note that in the above derivation, the term H(q(thetha(at|st))) should have been log (qthetha(at|st)), and that log refers to log base e (i.e. natural logarithm). In the first line of gradient, it should have been r(st,at) - log(qthetha(at|st)). In particular, I …

Topic: derivation policy-gradients reinforcement-learning gradient-descent machine-learning

Category: Data Science

Adding a group specific penalty to binary cross-entropy

Tim

2022年1月28日 19:13

I want to implement a custom Keras loss function that consists of plain binary cross-entropy plus a penalty that increases the loss for false negatives from one class (each observation can belong to one of two classes, privileged and unprivileged) and decreases the loss for true positives from that same class. My implementation so far can be seen below. Unfortunately, it does not work yet, because as you can see, I simply add the penalty to the binary cross-entropy, and …

Topic: derivation keras loss-function gradient-descent python

Category: Data Science

Understanding the step of SGD for binary classification

com

2022年1月13日 18:04

I cannot understand the step of SGD for binary classification. For example, we have $y$ - true labels $\in \{0,1\}$ and $p=f_\theta(x)$-predicted labels $\in [0,1]$. Then, the update step of SGD is the following $\Theta' \leftarrow \Theta - \nu \frac{\partial L(y,f_\theta(x))}{\partial \Theta}$, where L - loss function. Then follows the replacement that I cannot understand $\Theta' \leftarrow \Theta - \nu \frac{\partial L(y,p)}{\partial p}| {\scriptscriptstyle p=f_\theta(x)} \frac{\partial f_\theta(x)}{\partial \Theta}$ Why do we need to take the derivate of $\partial p$? Why …

Topic: sgd derivation mathematics

Category: Data Science

Loss function for points inside polygon

Humam Helfawi

2021年10月21日 15:40

I am trying to optimize some parameters that used to transform 2d points from a place to another (you may think of that as rotation & translation parameter for simplicity) The parameters are considered optimal if the transformed points lay inside a pre-defined convex polygon. Otherwise, the parameters should be adjusted till all points lay inside that polygon. I do not care how the points are arranged inside that polygon, my only concern that they are inside.. How can I …

Topic: derivation optimization

Category: Data Science

Batch normalization backpropagation doubts

Chappie733

2021年10月10日 13:22

I have recently studied the batch normalization layer and its backpropagation process, using as my main sources the original paper and this website showing part of the derivation process, but there is a step in the part that isn't covered that I don't really understand, namely, using the notation of the website, this is when computing: $$ \frac{\partial \widehat{x}_i}{\partial x_i} = \frac{\partial}{\partial x_i} \frac{x_i - \mu}{\sqrt{\sigma^2+\epsilon}} = \frac{1}{\sqrt{\sigma^2+\epsilon}} $$ Applying the quotient rule I expected the following (since $\mu$ and …

Topic: derivation batch-normalization backpropagation

Category: Data Science

SVM - Making sense of distance derivation

Alan Yue

2020年11月6日 15:11

I am studying the math behind SVM. The following question is about a small but important detail during the SVM derivation. The question Why the distance between the hyperplane $w*x+b=0$ and data point (in vector form) $p$, $d = \frac{w * p + b}{||w||}$ can be simplified to $d = \frac{1}{||w||}$? My argument Since data point $p$ is not on the hyperplane, then we have $w*p+b=k, k \ne 0$. Then $d=\frac{k}{||w||}$ but $k$ is not a constant as it depends …

Topic: derivation mathematics svm

Category: Data Science

Is it valid to use numpy.gradient to find slope of line as well as slope of curve at any point?

Aj_MLstater

2020年9月21日 08:04

what is the difference between slope of the line and slope of the curve? Is it valid to use numpy.gradient to find the slope of the line and slope of the curve at any point? #slope of line at any point tanθ= y2-y1/x2-x1 #slope of curve at any point tanθ =dy/dx is it valid to use numpys np.gradient() to get both slopes of curve and line ? or is it meant only to find the slope of line? Reference slope …

Topic: derivation numpy gradient-descent scikit-learn python

Category: Data Science

How is this score function estimator derived?

adam

2020年9月21日 06:44

In this paper they have this equation, where they use the score function estimator, to estimate the gradient of an expectation. How did they derive this?

Topic: derivation policy-gradients reinforcement-learning machine-learning

Category: Data Science

Deriving vectorized form of linear regression

user2793618

2020年9月21日 06:44

We first have the weights of a D dimensional vector $w$ and a D dimensional predictor vector $x$, which are all indexed by $j$. There are $N$ observations, all D dimensional. $t$ is our targets, i.e, ground truth values. We then derive the cost function as follows: We then compute the partial derivate of $\varepsilon$ with respect to $w_j$: I'm confused as to where the $j'$ is coming from, and what it would represent. We then write it as: Then, …

Topic: derivation linear-algebra linear-regression regression machine-learning

Category: Data Science

Why is it valid to remove a constant factor from the derivative of an error function?

Dhruv Agarwal

2020年9月19日 12:21

I was reading the book 'Make your own neural network' by Tariq Rashid. In his book, he said: (Note - He's talking about normal feed forward neural networks) The $t_k$ is the target value at node $k$, the $O_k$ is the predicted output at node $k$, $W_{jk}$ is the weight connecting the node $j$ and $k$ and the $E$ is the error at node $k$ Then he says that, we can remove the 2 because we only care about the …

Topic: derivation deep-learning neural-network machine-learning

Category: Data Science

A Derivation in Combinatory Categorial Grammer

chikitin

2020年6月26日 16:02

I am reading about CCG on page 23 of Speech and Language processing. There is a derivation as follows: (VP/PP)/NP , VP\((VP/PP)/NP) => VP? Can anyone example this please? This make sense if VP\((VP/PP)/NP) is equivalent to (VP\(VP/PP))/NP and (VP/PP)/NP is equivalent to VP/(PP/NP). But they seem at least non-trivial from the text! Any help would be greatly appreciated. CS

Topic: derivation nlp

Category: Data Science

back propagation through time derivation issue

username123

2020年5月27日 14:16

I read several posts about BPTT for RNN, but I am actually a bit confused about one step in the derivation. Given $$h_t=f(b+Wh_{t-1}+Ux_t)$$ when we compute $\frac{\partial h_t}{\partial W}$, does anyone know why is it simply $$\frac{\partial h_t}{\partial W}=\frac{\partial h_{t}}{\partial h_{t-1}}\frac{\partial h_{t-1}}{\partial W}$$ not $$\frac{\partial h_t}{\partial W}=\frac{\partial h_{t}}{\partial h_{t-1}}\frac{\partial h_{t-1}}{\partial W}+\frac{\partial h_t}{\partial f}h_{t-1}$$ ? What I mean is, since both $W$ and $h_{t-1}$ depends on $W$, why is the second term in the expression above missing? Thank you!

Topic: derivation rnn deep-learning machine-learning

Category: Data Science

Doubt in Derivation of Backpropagation

Lambda

2020年5月27日 14:15

I was going through the derivation of backpropagation algorithm provided in this document (adding just for reference). I have doubt at one specific point in this derivation. The derivation goes as follows: Notation: The subscript $k$ denotes the output layer The subscript $j$ denotes the hidden layer The subscript $i$ denotes the input layer $w_{kj}$ denotes a weight from the hidden to the output layer $w_{ji}$ denotes a weight from the input to the hidden layer $a$ denotes an activation …

Topic: derivation backpropagation deep-learning neural-network machine-learning

Category: Data Science

About