Large jumps in loss in simple transformer model?

As an exercise, I created a very simple transformer model that just sees the same simple batch of dummy data repeatedly and (one would assume) should quickly learn to fit it perfectly. And indeed, training reaches a loss of zero quickly. However I noticed that the loss does not stay at zero, or even close to it: there are occasional large jumps in the loss. The script below counts every time that the loss jumps by 10 or more between …
Category: Data Science

Is there a random forest env (sci-kit, TFDF, R, etc) that has an implementation for multi-output regression?

It is easy to adapt the idea of tree based linear regression to perform logistic regression: The decision boundaries of the tree divide the space of independent variables into hyper-cubes, and each hyper-cube is assigned a value that serves as the output of the model. Instead of the decision boundaries and value being chosen to minimize the sum of squared residuals, it should minimize the total binary cross entropy loss (equivalent to maximizing the likelihood). Taking this a step further, …
Category: Data Science

neural network binary classification softmax logsofmax and loss function

I am building a binary classification where the class I want to predict is present only <2% of times. I am using pytorch The last layer could be logosftmax or softmax. self.softmax = nn.Softmax(dim=1) or self.softmax = nn.LogSoftmax(dim=1) my questions I should use softmax as it will provide outputs that sum up to 1 and I can check performance for various prob thresholds. is that understanding correct? if I use softmax then can I use cross_entropy loss? This seems to …
Category: Data Science

The effects of Double Logarithms (Log Cross Entropy Loss) + Overfitting

My network involves two losses: one is a binary cross entropy, and the other is a multi-label cross entropy. The yellow graphs are the ones with double logarithm, meaning that we log(sum(ce_loss)). The red pink graphs are the ones with just sum(ce_loss). The dash lines represent validation step. The solid lines represent training step. The top yellow and top red-pink figures both represent the count of 1s. Both are supposed to converge to 30. It is clear that the top …
Category: Data Science

ignoring instances or masking by zero in a multitask learning model

For a multitask learning model, I've seen that approaches usually mask the output that doesn't have a label with zeros. As an example, have a look here: How to Multi-task learning with missing labels in Keras I have another idea, which is, instead of masking the missed output with zeros, why don't we ignore it from the loss function? The CrossEntropyLoss implementation in Pytorch allows specifying a value to be ignored: CrossEntropyLoss . Is this going to be ok?
Category: Data Science

Shannon Information Content related to Uncertainty?

I'm a data scientist student currently writing my master thesis which resolves around the Cross Entropy (CE) Loss Function for neural networks. From my understanding, the CE is based on the Entropy, which in turn is based on the Shannon Information Content (SIC), however I struggle to interpret and explain it in such a way that my fellow students can understand it without using concepts of information theory (which itself is already a completely different and complicated area). In the …
Category: Data Science

Why is cross entropy based on Bernoulli or Multinoulli probability distribution?

When we use logistic regression, we use cross entropy as the loss function. However, based on my understanding and https://machinelearningmastery.com/cross-entropy-for-machine-learning/, cross entropy evaluates if two or more distributions are similar to each other. And the distributions are assumed to be Bernoulli or Multinoulli. So, my question is: why we can always use cross entropy, i.e., Bernoulli in regression problems? Does the real values and the predicted values always follow such distribution?
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.