backpropagation

Confusion with L2 Regularization in Back-propagation

10xAI

2022年6月3日 17:22

In a very simple language, this is L2 regularization $\hspace{3cm}$$Loss_R$ = $Loss_N + \sum w_i^2$ $Loss_N$ - Loss without regularization $Loss_R$ - Loss with regularization When implementing [Ref], we simply add the derivative of the new penaty to the current delta weight, $\hspace{3cm}$$dw = dw_N + constant*w$ $dw_N$ - Weight delta without regularization What I think - L2 regularization is achieved with the last step only i.e. the weight is penalized. My question is - Why do we then add …

Topic: mathematics regularization backpropagation

Category: Data Science

calculating gradient descent

imene

2022年5月26日 10:52

when using mini batch gradient descent , we perform backpropagation after each batch , ie we calculate the gradient after each batch , we also capture y-hat after each sample in the batch and finally calculate the loss function all over the batch , we use this latter to calculate the gradient, correct ? now as the chainrule states we calculate the gradient this way for the below neural network: the question is if we calculate the gradient after passing …

Topic: backpropagation loss-function gradient-descent neural-network

Category: Data Science

Is the Cross Entropy Loss important at all, because at Backpropagation only the Softmax probability and the one hot vector are relevant?

clemens_m

2022年5月25日 05:02

Is the Cross Entropy Loss (CEL) important at all, because at Backpropagation (BP) only the Softmax (SM) probability and the one hot vector are relevant? When applying BP, the derivative of CEL is the difference between the output probability (SM) and the one hot encoded vector. For me the CEL output, which is very sophisticated, does not play any roll for learning. I´m expecting a fallacy in my reasoning, so could somebody please help me out?

Topic: softmax backpropagation loss-function deep-learning

Category: Data Science

Backtracking filter coefficients of Convolutional Neural Networks

Juan Cruz Carrau

2022年5月20日 14:02

I'm starting to learn how convolutional neural networks work, and I have a question regarding the filters. Apparently, these are randomly generated when the model is generated, and then as the data is fed, these are corrected accordingly as with the weights in backtracking. However, how does this work in filters? To my understanding, backtracking works by calculating how much an actual weight contributed to the total error after an output has been predicted, and then correct it accordingly. I've …

Topic: convolutional-neural-network backpropagation neural-network

Category: Data Science

Generator losses in WGAN and potential convergence failure

Mr. Johnny Doe

2022年5月4日 01:02

I have been training a WGAN for a while now, with my generator training once in every five epochs. I have tried several model architectures(no of filters) and also tried varying the relationship with each other. No matter what happens, my output is essentially noise. On further reading, it seems to be a classic case of convergence failure. Over time, my generator loss gets more and more negative while my discriminator loss remains around -0.4 My guess is that since …

Topic: gan backpropagation loss-function

Category: Data Science

Backpropagation with log likelihood cost function and softmax activation

Nagabhushan S N

2022年5月3日 00:02

In the online book on neural networks by Michael Nielsen, in chapter 3, he introduces a new cost function called as log-likelihood function defined as below $$ C = -ln(a_y^L) $$ Suppose we have 10 output neurons, when back propagating the error, only the gradient w.r.t. $y^{th}$ output neuron is non-zero and all others are zero. Is that right? If so, how is the below equation (81) true? $$\frac{\partial C}{\partial b_j^L} = a_j^L - y_j $$ I'm getting the expression …

Topic: softmax backpropagation neural-network

Category: Data Science

Backpropagation in NN

Prajwal

2022年4月29日 21:31

During backward pass, which gradients are kept and which gradients are discarded? Why are some gradients discarded? I know that forward pass is computing the output of the network given the inputs and computing the loss. Backward pass is computing the gradients for each weight loss.

Topic: gradient backpropagation deep-learning neural-network

Category: Data Science

Force neural network to only product positive values

Sharma

2022年4月27日 17:02

I have a custom neural network that has been written from scratch in python and also a dataset where negative target/response values are impossible, however my model sometimes produces negatives forecasts/fits which I'd like to completely avoid. Rather than transform the input data or clip the final forecasts, I'd like to force my neural network to only product positive values (or values above a given threshold) during forward and back propagation. I believe I understand what needs to be done …

Topic: backpropagation neural-network

Category: Data Science

Is Loss value (e.g., MSE loss) used in the calculation for parameter update when doing gradient descent?

AZ123

2022年4月25日 06:57

My question is really simple. I know the theory behind gradient descent and parameter updates, what I really haven't found clarity on is that is the loss value (e.g., MSE value) used, i.e., multiplied at the start when we do the backpropagation for gradient descent (e.g., multiplying MSE loss value with 1 then doing backprop, as at the start of backprop we start with the value 1, i.e., derivative of x w.r.t x is 1)? If loss value isn't used …

Topic: hyperparameter-tuning backpropagation loss-function neural-network

Category: Data Science

What is the dimensionality of the bias term in neural networks?

PyRsquared

2022年4月23日 06:59

I am trying to build a neural network (3 layers, 1 hidden) in Python on the classic Titanic dataset. I want to include a bias term following Siraj's examples, and the 3Blue1Brown tutorials to update the bias by backpropagation, but I know my dimensionality is wrong. (I feel I am updating the biases incorrectly which is causing the incorrect dimensionality) The while loop in the code below works for a training dataset, where the node products and biases have the …

Topic: backpropagation neural-network python

Category: Data Science

How to plot the computational graph and derive the update procedure of parameters using the backpropagation algorithm?

Nezuko

2022年4月19日 11:25

Please help me to solve this problem without a code (ps: this is a written problem): Given the following loss function, please plot the computational graph, and derive the update procedure of parameters using the backpropagation algorithm, where $W$ = {$W_1, W_2, W_3, W_4$}, $b$ = {$b_1, b_2, b_3, b_4$} denote the parameters; $x ∈ R^d$ indicates the input features; $y ∈ R$ is the ground-truth label.

Topic: backpropagation loss-function neural-network algorithms machine-learning

Category: Data Science

Keras Backpropagation when Later Layers are Frozen

computerVision_Doofus

2022年4月15日 03:41

I am working on a project with facial image translation and GANs and still have some conceptual misunderstandings. In my definition of my model, I extract a deep embedding of my generated image and the input image using a state of the art CNN which I mark as untrainable, calculate the distance between these embeddings and use this distance itself as a loss in my model definition. If the model from which the embeddings come from is untrainable, will the …

Topic: cyclegan gan keras backpropagation computer-vision

Category: Data Science

Hochreiter LSTM (p. 4): Maximal values of logistic sigmoid derivative times weight

The AI Therapist

2022年4月13日 23:05

My questions follow the below page 4 excerpt from Hochreiter's LSTM paper: If $f_{l_{m}}$ is the logistic sigmoid function, then the maximal value of $f^\prime_{l_{m}}$ is 0.25. If $y^{l_{m-1}}$ is constant and not equal to zero, then $|f^\prime_{l_{m}}(net_{l_{m}})w_{l_{m}l_{m-1}}|$ takes on maximal values where $w_{l_{m}l_{m-1}} = {1 \over y^{l_{m-1}}} \coth \left( {1 \over 2}net_{l_{m}} \right)$, goes to zero for $|w_{l_{m}l_{m-1}}| \rightarrow \infty$, and is less than 1.0 for $|w_{l_{m}l_{m-1}}| < 4.0$. The derivative of the sigmoid $f_{l_{m}} = f^\prime_{l_{m}} = \sigma$, …

Topic: lstm backpropagation neural-network machine-learning

Category: Data Science

Question about grad() from Deep Learning by Chollet

alpastor

2022年4月10日 23:44

On page 58 of the second edition of Deep Learning with Python, Chollet is illustrating an example of a forward and backward pass of a computation graph. The computation graph is given by: $$ x\to w\cdot x := x_1 \to b + x_1 := x_2 \to \text{loss}:=|y_\text{true}-x_2|. $$ We are given that $x=2$, $w=3$, $b=1$, $y_{\text{true}}=4$. When running the backward pass, he calculates $$ grad(\text{loss},x_2) = grad(|4-x_2|,x_2) = 1. $$ Why is the following not true: $$ grad(\text{loss},x_2) = \begin{cases} …

Topic: gradient backpropagation

Category: Data Science

Transferring the hidden state of a RNN to another RNN

Babypopo

2022年4月6日 06:03

I am using Reinforcement Learning to teach an AI an Austrian Card Game with imperfect information called Schnapsen. For different states of the game, I have different neural networks (which use different features) that calculate the value/policy. I would like to try using RNNs, as past actions may be important to navigate future decisions. However, as I use multiple neural networks, I somehow need to constantly transfer the hidden state from one RNN to another one. I am not quite …

Topic: backpropagation rnn reinforcement-learning

Category: Data Science

Understanding the text from the paper 'Efficient BackProp' by Yann LeCun

I_am_rahul

2022年4月5日 08:00

Sorry, I just started in Deep Learning, so I am trying my best not to assume anything unless I am absolutely sure. Going through comments here someone recommended this excellent paper on backpropagation Efficient BackProp by Yann LeCun. While reading I stuck at '4.5 Choosing Target Values'. I can't copy paste the text as pdf is not allowing it so posting the screenshot here. Most of the paper was clear to me but I couldn't understand exactly what the author …

Topic: backpropagation deep-learning machine-learning

Category: Data Science

Is there a difference between AutoGrad and explicit derivatives (gradient)?

Rubyat

2022年4月4日 13:03

Will there be some differences between applying AutoGrad on the loss function (using a python library) and applying explicit gradient (the gradient from the paper or the update rule)? For example: numerical, runtime, mathematical, or stability differences.

Topic: gradient backpropagation gradient-descent deep-learning machine-learning

Category: Data Science

How backpropagation works in case of 2 hidden layers?

Gergő Horváth

2022年4月2日 18:12

Imagine the next structure (for simplicity, there's no bias, and activation formatting with sigmoid or relu, just weights). The input has two neurons, the two hidden layers have 3 neurons each, the output layer has two neurons, so a cost ($\sum C$) with two "subcosts" ($C^1$, $C^2$). (I'm new at machine learning, and super confused with the different notations, and formatting, indexes, so to clarify, in case of activations, the upper index will show the index of it in the …

Topic: mathematics backpropagation machine-learning

Category: Data Science

Gradient flow through concatenation operation

Monster

2022年3月22日 10:00

I need help in understanding the gradient flow through a concatenation operation. I'm implementing a network (mostly a CNN) which has a concatenation operation (in pytorch). The network is defined such that the responses of passing two different images through a CNN are concatenated and passed through another CNN and the training is done end to end. Since the first CNN is shared between both of the inputs to the concatenation, I was wondering how the gradients should be distributed …

Topic: convolutional-neural-network backpropagation computer-vision deep-learning

Category: Data Science

Back propagation on matrix of weights

deep_learner

2022年3月20日 05:09

I am trying to implement a Neural Network for binary classification using python and numpy only. My network structure is as follows: input features: 2 [1X2] matrix Hidden layer1: 5 neurons [2X5] matrix Hidden layer2: 5 neurons [5X5] matrix Output layer: 1 neuron [5X1]matrix I have used the sigmoid activation function in all the layers. Now lets say I use binary cross entropy as my loss function. How do I do the back propagation on these matrices to update weights? …

Topic: backpropagation neural-network python machine-learning

Category: Data Science

About