As an exercise, I created a very simple transformer model that just sees the same simple batch of dummy data repeatedly and (one would assume) should quickly learn to fit it perfectly. And indeed, training reaches a loss of zero quickly. However I noticed that the loss does not stay at zero, or even close to it: there are occasional large jumps in the loss. The script below counts every time that the loss jumps by 10 or more between …
It is easy to adapt the idea of tree based linear regression to perform logistic regression: The decision boundaries of the tree divide the space of independent variables into hyper-cubes, and each hyper-cube is assigned a value that serves as the output of the model. Instead of the decision boundaries and value being chosen to minimize the sum of squared residuals, it should minimize the total binary cross entropy loss (equivalent to maximizing the likelihood). Taking this a step further, …
I am building a binary classification where the class I want to predict is present only <2% of times. I am using pytorch The last layer could be logosftmax or softmax. self.softmax = nn.Softmax(dim=1) or self.softmax = nn.LogSoftmax(dim=1) my questions I should use softmax as it will provide outputs that sum up to 1 and I can check performance for various prob thresholds. is that understanding correct? if I use softmax then can I use cross_entropy loss? This seems to …
My network involves two losses: one is a binary cross entropy, and the other is a multi-label cross entropy. The yellow graphs are the ones with double logarithm, meaning that we log(sum(ce_loss)). The red pink graphs are the ones with just sum(ce_loss). The dash lines represent validation step. The solid lines represent training step. The top yellow and top red-pink figures both represent the count of 1s. Both are supposed to converge to 30. It is clear that the top …
For a multitask learning model, I've seen that approaches usually mask the output that doesn't have a label with zeros. As an example, have a look here: How to Multi-task learning with missing labels in Keras I have another idea, which is, instead of masking the missed output with zeros, why don't we ignore it from the loss function? The CrossEntropyLoss implementation in Pytorch allows specifying a value to be ignored: CrossEntropyLoss . Is this going to be ok?
I'm a data scientist student currently writing my master thesis which resolves around the Cross Entropy (CE) Loss Function for neural networks. From my understanding, the CE is based on the Entropy, which in turn is based on the Shannon Information Content (SIC), however I struggle to interpret and explain it in such a way that my fellow students can understand it without using concepts of information theory (which itself is already a completely different and complicated area). In the …
When we use logistic regression, we use cross entropy as the loss function. However, based on my understanding and https://machinelearningmastery.com/cross-entropy-for-machine-learning/, cross entropy evaluates if two or more distributions are similar to each other. And the distributions are assumed to be Bernoulli or Multinoulli. So, my question is: why we can always use cross entropy, i.e., Bernoulli in regression problems? Does the real values and the predicted values always follow such distribution?