Siamese Network for face comparison wont learn, accuracy stuck on 0.5, and loss stuck too
I'm trying to train a siamese network which contains a CNN and an embedding layer at the end to yield 2 similar (close) vectors for 2 images of the same person. I'm using the LFW_Cropped dataset, and some custom made generators. The generators are tested and returns batches of 50% 50% Same and Different pairs of images with the correct label.
The labels for same and different outcome are:
SAME = 1 - (named as 'yes' in my code)
DIFFERENT = 0 - (named as 'no' in my code)
My model's body architecture is as below:
Model: model
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) [(None, 224, 224, 3)] 0
_________________________________________________________________
separable_conv2d (SeparableC (None, 224, 224, 64) 331
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 112, 112, 64) 0
_________________________________________________________________
separable_conv2d_1 (Separabl (None, 112, 112, 64) 4736
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 56, 56, 64) 0
_________________________________________________________________
batch_normalization (BatchNo (None, 56, 56, 64) 256
_________________________________________________________________
separable_conv2d_2 (Separabl (None, 56, 56, 256) 17216
_________________________________________________________________
batch_normalization_1 (Batch (None, 56, 56, 256) 1024
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 28, 28, 256) 0
_________________________________________________________________
dropout (Dropout) (None, 28, 28, 256) 0
_________________________________________________________________
separable_conv2d_3 (Separabl (None, 28, 28, 512) 133888
_________________________________________________________________
separable_conv2d_4 (Separabl (None, 28, 28, 512) 267264
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 14, 14, 512) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 14, 14, 512) 0
_________________________________________________________________
separable_conv2d_5 (Separabl (None, 14, 14, 512) 267264
_________________________________________________________________
batch_normalization_2 (Batch (None, 14, 14, 512) 2048
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 7, 7, 512) 0
_________________________________________________________________
dropout_2 (Dropout) (None, 7, 7, 512) 0
_________________________________________________________________
flatten (Flatten) (None, 25088) 0
_________________________________________________________________
EmbeddingsVect (Dense) (None, 1024) 25691136
=================================================================
Total params: 26,385,163
Trainable params: 26,383,499
Non-trainable params: 1,664
___________________
And the whole Siamese network looks like the one below:
Model: model_1
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 224, 224, 3) 0
__________________________________________________________________________________________________
input_2 (InputLayer) [(None, 224, 224, 3) 0
__________________________________________________________________________________________________
model (Functional) (None, 1024) 26385163 input_1[0][0]
input_2[0][0]
__________________________________________________________________________________________________
lambda (Lambda) (None, 1024) 0 model[0][0]
model[1][0]
__________________________________________________________________________________________________
dense (Dense) (None, 1) 1025 lambda[0][0]
==================================================================================================
Total params: 26,386,188
Trainable params: 26,384,524
Non-trainable params: 1,664
__________________________________________________________________________________________________
I was trying to use some different distance metrices - [euclidian l1, euclidian l2, 1 - cosine similarity]
def distance_calculation(vect, tech='cos'):
distance between embedding vectors calculations techniques.
:param vect: a list with 2 embedding vectors.
:param tech: 'cos' / 'abs_diff' / 'euclid'
:return:
(featsA, featsB) = vect
# cosine similarity - SAME = 0
if tech == 'cos':
feats_a_normalized = K.sqrt(K.maximum(K.sum(K.square(featsA), axis=1, keepdims=True), K.epsilon()))
feats_b_normalized = K.sqrt(K.maximum(K.sum(K.square(featsB), axis=1, keepdims=True), K.epsilon()))
numerator = K.batch_dot(featsA, featsB)
denominator = feats_a_normalized * feats_b_normalized
return 1 - (numerator / denominator)
# abs diff - SAME = 0
elif tech == 'euclid1':
featsA = featsA / (K.sum(K.abs(featsA), axis=1, keepdims=True))
featsB = featsB / (K.sum(K.abs(featsB), axis=1, keepdims=True))
dist = K.abs(featsA - featsB)
return dist
# euclidian distance - SAME = 0
elif tech == 'euclid2':
# compute the sum of squared distances between the vectors
sum_squared = K.sum(K.square(featsA - featsB), axis=1, keepdims=True)
# return the euclidean distance between the vectors
dist = K.sqrt(K.maximum(sum_squared, K.epsilon()))
return dist
None of the above worked.
I also tried 2 different loss functions - 'binary_crossentropy' and 'contrastive_loss'. The code for contrastive loss was taken from Keras api.
I couldn't figure out which margin I should use for the contrastive loss, I have tried margin values in region of [0.1, 10], it didnt do any magic.
def loss(margin=1, loss_func='binary_crossentropy'):
Provides 'constrastive_loss' an enclosing scope with variable 'margin'.
Arguments:
margin: Integer, defines the baseline for distance for which pairs
should be classified as dissimilar. - (default is 1).
Returns:
'constrastive_loss' function with data ('margin') attached.
:param loss_func:
def bin_ent_loss(y_true, y_pred):
tf.print(f'\nPREDICTIONS{tf.transpose(y_pred).numpy()}')
tf.print(tf.transpose(y_true).numpy())
return tf.keras.losses.binary_crossentropy(y_true, y_pred)
# Contrastive loss = mean( (1-true_value) * square(prediction) +
# true_value * square( max(margin-prediction, 0)))
def contrastive_loss(y_true, y_pred):
Calculates the constrastive loss.
Arguments:
y_true: List of labels, each label is of type float32.
y_pred: List of predictions of same length as of y_true,
each label is of type float32.
Returns:
A tensor containing constrastive loss as floating point value.
tf.print(f'\nPREDICTIONS{tf.transpose(y_pred).numpy()}')
tf.print(tf.transpose(y_true).numpy())
square_pred = tf.math.square(y_pred)
# tf.print(fSQUARED PREDS: {tf.transpose(square_pred)})
margin_square = tf.math.square(tf.math.maximum(margin - y_pred, 0))
# tf.print(fMARGIN_SQUARED: {tf.transpose(margin_square)})
return tf.math.reduce_mean((1 - y_true) * square_pred + y_true * margin_square)
return bin_ent_loss if loss_func == 'binary_crossentropy' else contrastive_loss
When I added prints to see what do I get from the network I saw many outputs look as such:
DIST: [[7.6213269e-04 8.1809750e-04 7.1329297e-05 ... 5.5958750e-05
1.2471946e-04 2.9939064e-04]
[0.0000000e+00 0.0000000e+00 0.0000000e+00 ... 0.0000000e+00
0.0000000e+00 0.0000000e+00]
[3.8766908e-04 1.9008480e-04 1.8005213e-04 ... 8.6155068e-04
3.4860382e-04 9.5038908e-04]
...
[0.0000000e+00 1.0397088e-04 2.5689020e-04 ... 1.8644718e-05
9.1908485e-05 4.6648251e-04]
[2.9215228e-04 1.4468818e-04 1.6864692e-04 ... 1.0702915e-03
8.5678510e-04 8.9964014e-05]
[0.0000000e+00 0.0000000e+00 0.0000000e+00 ... 0.0000000e+00
0.0000000e+00 0.0000000e+00]]
PREDICTIONS[[0.50033516 0.50027233 0.50031793 0.500271 0.50022817 0.5003597
0.50027865 0.50018543 0.50036544 0.50030404 0.500344 0.50030637
0.50027204 0.5002804 0.50025326 0.50039023]]
array([[1., 1., 1., 0., 0., 1., 0., 0., 0., 1., 0., 1., 1., 1., 0., 0.]],
The focus should be on the PREDICTION vector which hold values which are near 0.5. that makes me think that my network is confused and cannt decide wether images are similar or not. Do notice one vector below is the 'Ground Truth' I couldnt figure out how to fix it. I tried adding some Dropouts, BatchNorms, Regularization (for last Dense layer's filters in the CNN part).
I have tried so many combinations and yet my model wont learn, and I couldnt think of any more ways that I can make it work.
Any help would be very appreciated.
The Training Procedure looks as in the graph below (Note, there is no validation set yet, although im willing to split the LFW set to get some validations):