Difference in performance Sigmoid vs. Softmax
For the same Binary Image Classification task, if in the final layer I use 1 node
with Sigmoid
activation function and binary_crossentropy
loss function, then the training process goes through pretty smoothly (92% accuracy after 3 epochs on validation data).
However, if I change the final layer to 2 nodes
and use the Softmax
activation function with sparse_categorical_crossentropy
loss function, then the model doesn't seem to learn at all and stuck at 55% accuracy (the ratio of the negative class).
Is this difference in performance normal? I thought for a binary classification task, Sigmoid with Binary Crossentropy
and Softmax with Sparse Categorical Crossentropy
should output similar if not identical results? Or did I do something wrong?
Note: I use Adam optimizer and there is a single label column containing 0s and 1s.
Edit: Code for the 2 cases
Case 1: Sigmoid with binary_crossentropy
def addTopModelMobilNetV1(bottom_model, num_classes):
top_model = bottom_model.output
top_model = layers.GlobalAveragePooling2D()(top_model)
top_model = layers.Dense(1024, activation='relu')(top_model)
top_model = layers.Dense(1024, activation='relu')(top_model)
top_model = layers.Dense(512, activation='relu')(top_model)
top_model = layers.Dense(1, activation='sigmoid')(top_model)
return top_model
fc_head = addTopModelMobilNetV1(mobilnet_model, num_classes)
model = Model(inputs=mobilnet_model.input, outputs=fc_head)
# print(model.summary())
earlystopping_cb = callbacks.EarlyStopping(patience=3, restore_best_weights=True)
model.compile(loss='binary_crossentropy', optimizer=optimizers.Adam(), metrics=['accuracy'])
history = model.fit_generator(generator=train_generator,
steps_per_epoch=train_df.shape[0]//TRAIN_BATCH_SIZE,
validation_data = val_generator,
epochs = 10,
callbacks = [earlystopping_cb]
)
Case 2: Softmax with sparse_categorical_crossentropy
def addTopModelMobilNetV1(bottom_model, num_classes):
top_model = bottom_model.output
top_model = layers.GlobalAveragePooling2D()(top_model)
top_model = layers.Dense(1024, activation='relu')(top_model)
top_model = layers.Dense(1024, activation='relu')(top_model)
top_model = layers.Dense(512, activation='relu')(top_model)
top_model = layers.Dense(2, activation='softmax')(top_model)
return top_model
fc_head = addTopModelMobilNetV1(mobilnet_model, num_classes)
model = Model(inputs=mobilnet_model.input, outputs=fc_head)
earlystopping_cb = callbacks.EarlyStopping(patience=3, restore_best_weights=True)
model.compile(loss='sparse_categorical_crossentropy', optimizer=optimizers.Adam(), metrics=['accuracy'])
history = model.fit_generator(generator=train_generator,
steps_per_epoch=train_df.shape[0]//TRAIN_BATCH_SIZE,
validation_data = val_generator,
epochs = 10,
callbacks = [earlystopping_cb]
)
Topic sigmoid softmax training loss-function image-classification
Category Data Science