neural network binary classification softmax logsofmax and loss function
I am building a binary classification where the class I want to predict is present only 2% of times. I am using pytorch
The last layer could be logosftmax
or softmax
.
self.softmax = nn.Softmax(dim=1)
or self.softmax = nn.LogSoftmax(dim=1)
my questions
I should use
softmax
as it will provide outputs that sum up to 1 and I can check performance for various prob thresholds. is that understanding correct?if I use
softmax
then can I usecross_entropy
loss? This seems to suggest that it is okay to useif i use
logsoftmax
then can I usecross_entropy
loss? This seems to suggest that I shouldnt.if I use
softmax
then is there any better option thancross_entropy
loss?` cross_entropy = nn.CrossEntropyLoss(weight=class_wts)`
##################m
My network's last few layers are as below: Could I just change the last layer to sigmoid? I feel that it could break my network -
self.batch_norm2 = nn.BatchNorm1d(num_filters)
self.fc2 = nn.Linear(np.sum(num_filters), fc2_neurons)
self.batch_norm3 = nn.BatchNorm1d(fc2_neurons)
self.fc3 = nn.Linear(fc2_neurons, 2)
self.softmax = nn.Softmax(dim=1)
question 5) should I replace the last 2 lines from above with these? Let me know if there are any other choices
I am using sigmoid
after linear
as I will get values between 0 and 1 and then I could use different probability cutoffs if required
self.fc3 = nn.Linear(fc2_neurons, 1)
self.sigmoid=nn.Sigmoid()
And
cross_entropy = nn.CrossEntropyLoss(weight=class_wts)
question 6) And Loss shown as below? Let me know if there are any other choice
#as class_wts have class weights
BCE_loss=nn.BCELoss(pos_weight = torch.tensor (class_wts[1]/class_wts[0]))
++++++++++++++++++++++++++++++++++++++++update1
can I use BCEWithLogitsLoss as it is more stable than sigmoid+BCE loss
? The documentation says that This loss combines a Sigmoid layer and the BCELoss in one single class. This version is more numerically stable than using a plain Sigmoid followed by a BCELoss as, by combining the operations into one layer, we take advantage of the log-sum-exp trick for numerical stability.
what I am thinking is that I will keep my last activation and loss as below
self.fc3 = nn.Linear(fc2_neurons, 1)
BCEwithlogits_loss=
nn.BCEWithLogitsLoss(pos_weight = torch.tensor (class_wts[0],class_wts[1]))
initially I will predict to class 1 if results of my last activation are greater than 0 as sigmoid(0)=0.5
. Then if I want to use different cutoffs then either I could change cutoff 0 to some different value or get logits from model, convert to probability using sigmoid and then make new predictions.
For example if I want 0.9 probability cut off then for logits I will use cutoff of 2.2 as sigmoid(2.2)
is 0.9
Topic cross-entropy binary-classification softmax pytorch
Category Data Science