neural network binary classification softmax logsofmax and loss function

I am building a binary classification where the class I want to predict is present only 2% of times. I am using pytorch

The last layer could be logosftmax or softmax.

self.softmax = nn.Softmax(dim=1) or self.softmax = nn.LogSoftmax(dim=1)

my questions

  1. I should use softmax as it will provide outputs that sum up to 1 and I can check performance for various prob thresholds. is that understanding correct?

  2. if I use softmax then can I use cross_entropy loss? This seems to suggest that it is okay to use

  3. if i use logsoftmax then can I use cross_entropy loss? This seems to suggest that I shouldnt.

  4. if I use softmax then is there any better option than cross_entropy loss?

        ` cross_entropy = nn.CrossEntropyLoss(weight=class_wts)`
    

##################m

My network's last few layers are as below: Could I just change the last layer to sigmoid? I feel that it could break my network -

        self.batch_norm2 = nn.BatchNorm1d(num_filters)
        
        self.fc2 = nn.Linear(np.sum(num_filters), fc2_neurons)
        
        self.batch_norm3 = nn.BatchNorm1d(fc2_neurons)
        
        self.fc3 = nn.Linear(fc2_neurons, 2)
             
        self.softmax = nn.Softmax(dim=1)

question 5) should I replace the last 2 lines from above with these? Let me know if there are any other choices

I am using sigmoid after linear as I will get values between 0 and 1 and then I could use different probability cutoffs if required

self.fc3 = nn.Linear(fc2_neurons, 1)

self.sigmoid=nn.Sigmoid()

And

cross_entropy = nn.CrossEntropyLoss(weight=class_wts)

question 6) And Loss shown as below? Let me know if there are any other choice

#as class_wts have class weights
BCE_loss=nn.BCELoss(pos_weight = torch.tensor (class_wts[1]/class_wts[0]))

++++++++++++++++++++++++++++++++++++++++update1

can I use BCEWithLogitsLoss as it is more stable than sigmoid+BCE loss? The documentation says that This loss combines a Sigmoid layer and the BCELoss in one single class. This version is more numerically stable than using a plain Sigmoid followed by a BCELoss as, by combining the operations into one layer, we take advantage of the log-sum-exp trick for numerical stability.

what I am thinking is that I will keep my last activation and loss as below

self.fc3 = nn.Linear(fc2_neurons, 1)
BCEwithlogits_loss=
nn.BCEWithLogitsLoss(pos_weight = torch.tensor (class_wts[0],class_wts[1]))

initially I will predict to class 1 if results of my last activation are greater than 0 as sigmoid(0)=0.5. Then if I want to use different cutoffs then either I could change cutoff 0 to some different value or get logits from model, convert to probability using sigmoid and then make new predictions.

For example if I want 0.9 probability cut off then for logits I will use cutoff of 2.2 as sigmoid(2.2) is 0.9

Topic cross-entropy binary-classification softmax pytorch

Category Data Science


Some elements to answer your questions:

  1. The softmax function is indeed generally used as a way to rescale the output of your network in a way such that the output vector can be interpreted as a probability distribution representing the prediction of your network. In general, if you want your network to make a prediction for the class of the input data, you just chose to return the class which as the highest "probability" after having applied the softmax function. In the case of binary classification, this would correspond to a threshold of 0.5. However, if you want to take into account some "degree of certainty" feel free to use higher thresholds.
  2. Absolutely. The cross entropy loss is used to compare distributions of probability.
  3. Cross entropy is not adapted to the log-probabilities returned by logsoftmax. Prefer using NLLLoss after logsoftmax instead of the cross entropy function. The results of the sequence softmax->cross entropy and logsoftmax->NLLLoss are pretty much the same regarding the final loss.
  4. Since you are doing binary classification, you could also use BCELoss which stand for binary cross entropy loss. In this case you do not need softmax but rather a function mapping your output to the interval [0,1] such as Sigmoid. Some alternatives may exist, but are useful only in specific cases and type of data. So I would suggest sticking to the loss cited before.

Since you have unbalanced data you can make use of the parameter "weight" which is available with both Cross entropy loss and the NLLLoss pytorch implementation. It can be used to put more weight on the less represented class of your dataset in the computation of the loss.

Edit: The modifications and your understanding in your points 5) and 6) are indeed correct if you decide to replace the Cross Entropy loss with BCELoss, except for the weight assignation. The weight parameter for BCELoss works differently than the other previously cited loss functions. You must actually assign a weight to each element of your batch.

One way to do it (Assuming you have a labels are either 0 or 1, and the variablelabels contains the labels of the current batch during training) First, you instantiate your loss:

criterion = nn.BCELoss()

Then, at each iteration of your training (before computing the loss for your current batch):

criterion.weight = labels * class_wts[1] + (1-labels)*class_wts[0]
# ...
# loss = criterion(predictions,labels)
# ...

This will have the effect of assigning the weight class_wts[1] to the positive examples and class_wts[0].

In any case, your code using Cross Entropy should work and gives pretty similar results.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.