Siamese Network for face comparison wont learn, accuracy stuck on 0.5, and loss stuck too

I'm trying to train a siamese network which contains a CNN and an embedding layer at the end to yield 2 similar (close) vectors for 2 images of the same person. I'm using the LFW_Cropped dataset, and some custom made generators. The generators are tested and returns batches of 50% 50% Same and Different pairs of images with the correct label.

The labels for same and different outcome are:

SAME = 1 - (named as 'yes' in my code)
DIFFERENT = 0 - (named as 'no' in my code)

My model's body architecture is as below:

Model: model
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_3 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
separable_conv2d (SeparableC (None, 224, 224, 64)      331       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 112, 112, 64)      0         
_________________________________________________________________
separable_conv2d_1 (Separabl (None, 112, 112, 64)      4736      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 56, 56, 64)        0         
_________________________________________________________________
batch_normalization (BatchNo (None, 56, 56, 64)        256       
_________________________________________________________________
separable_conv2d_2 (Separabl (None, 56, 56, 256)       17216     
_________________________________________________________________
batch_normalization_1 (Batch (None, 56, 56, 256)       1024      
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 28, 28, 256)       0         
_________________________________________________________________
dropout (Dropout)            (None, 28, 28, 256)       0         
_________________________________________________________________
separable_conv2d_3 (Separabl (None, 28, 28, 512)       133888    
_________________________________________________________________
separable_conv2d_4 (Separabl (None, 28, 28, 512)       267264    
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 14, 14, 512)       0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 14, 14, 512)       0         
_________________________________________________________________
separable_conv2d_5 (Separabl (None, 14, 14, 512)       267264    
_________________________________________________________________
batch_normalization_2 (Batch (None, 14, 14, 512)       2048      
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 7, 7, 512)         0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 7, 7, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0         
_________________________________________________________________
EmbeddingsVect (Dense)       (None, 1024)              25691136  
=================================================================
Total params: 26,385,163
Trainable params: 26,383,499
Non-trainable params: 1,664
___________________

And the whole Siamese network looks like the one below:

Model: model_1

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            [(None, 224, 224, 3) 0                                            
__________________________________________________________________________________________________
input_2 (InputLayer)            [(None, 224, 224, 3) 0                                            
__________________________________________________________________________________________________
model (Functional)              (None, 1024)         26385163    input_1[0][0]                    
                                                                 input_2[0][0]                    
__________________________________________________________________________________________________
lambda (Lambda)                 (None, 1024)         0           model[0][0]                      
                                                                 model[1][0]                      
__________________________________________________________________________________________________
dense (Dense)                   (None, 1)            1025        lambda[0][0]                     
==================================================================================================
Total params: 26,386,188
Trainable params: 26,384,524
Non-trainable params: 1,664
__________________________________________________________________________________________________

I was trying to use some different distance metrices - [euclidian l1, euclidian l2, 1 - cosine similarity]

def distance_calculation(vect, tech='cos'):
    
    distance between embedding vectors calculations techniques.
    :param vect: a list with 2 embedding vectors.
    :param tech: 'cos' / 'abs_diff' / 'euclid'
    :return:
    
    (featsA, featsB) = vect
    # cosine similarity - SAME = 0
    if tech == 'cos':
        feats_a_normalized = K.sqrt(K.maximum(K.sum(K.square(featsA), axis=1, keepdims=True), K.epsilon()))
        feats_b_normalized = K.sqrt(K.maximum(K.sum(K.square(featsB), axis=1, keepdims=True), K.epsilon()))
        numerator = K.batch_dot(featsA, featsB)
        denominator = feats_a_normalized * feats_b_normalized
        return 1 - (numerator / denominator)
    # abs diff - SAME = 0
    elif tech == 'euclid1':
        featsA = featsA / (K.sum(K.abs(featsA), axis=1, keepdims=True))
        featsB = featsB / (K.sum(K.abs(featsB), axis=1, keepdims=True))
        dist = K.abs(featsA - featsB)
        return dist
    # euclidian distance - SAME = 0
    elif tech == 'euclid2':
        # compute the sum of squared distances between the vectors
        sum_squared = K.sum(K.square(featsA - featsB), axis=1, keepdims=True)
        # return the euclidean distance between the vectors
        dist = K.sqrt(K.maximum(sum_squared, K.epsilon()))
        return dist

None of the above worked.

I also tried 2 different loss functions - 'binary_crossentropy' and 'contrastive_loss'. The code for contrastive loss was taken from Keras api.

I couldn't figure out which margin I should use for the contrastive loss, I have tried margin values in region of [0.1, 10], it didnt do any magic.

def loss(margin=1, loss_func='binary_crossentropy'):
    Provides 'constrastive_loss' an enclosing scope with variable 'margin'.

  Arguments:
      margin: Integer, defines the baseline for distance for which pairs
              should be classified as dissimilar. - (default is 1).

  Returns:
      'constrastive_loss' function with data ('margin') attached.
      :param loss_func:
  

    def bin_ent_loss(y_true, y_pred):
        tf.print(f'\nPREDICTIONS{tf.transpose(y_pred).numpy()}')
        tf.print(tf.transpose(y_true).numpy())
        return tf.keras.losses.binary_crossentropy(y_true, y_pred)

    # Contrastive loss = mean( (1-true_value) * square(prediction) +
    #                         true_value * square( max(margin-prediction, 0)))
    def contrastive_loss(y_true, y_pred):
        Calculates the constrastive loss.
            Arguments:
            y_true: List of labels, each label is of type float32.
            y_pred: List of predictions of same length as of y_true,
                    each label is of type float32.
            Returns:
            A tensor containing constrastive loss as floating point value.
        
        tf.print(f'\nPREDICTIONS{tf.transpose(y_pred).numpy()}')
        tf.print(tf.transpose(y_true).numpy())
        square_pred = tf.math.square(y_pred)
        # tf.print(fSQUARED PREDS: {tf.transpose(square_pred)})
        margin_square = tf.math.square(tf.math.maximum(margin - y_pred, 0))
        # tf.print(fMARGIN_SQUARED: {tf.transpose(margin_square)})
        return tf.math.reduce_mean((1 - y_true) * square_pred + y_true * margin_square)

    return bin_ent_loss if loss_func == 'binary_crossentropy' else contrastive_loss

When I added prints to see what do I get from the network I saw many outputs look as such:

DIST: [[7.6213269e-04 8.1809750e-04 7.1329297e-05 ... 5.5958750e-05
  1.2471946e-04 2.9939064e-04]
 [0.0000000e+00 0.0000000e+00 0.0000000e+00 ... 0.0000000e+00
  0.0000000e+00 0.0000000e+00]
 [3.8766908e-04 1.9008480e-04 1.8005213e-04 ... 8.6155068e-04
  3.4860382e-04 9.5038908e-04]
 ...
 [0.0000000e+00 1.0397088e-04 2.5689020e-04 ... 1.8644718e-05
  9.1908485e-05 4.6648251e-04]
 [2.9215228e-04 1.4468818e-04 1.6864692e-04 ... 1.0702915e-03
  8.5678510e-04 8.9964014e-05]
 [0.0000000e+00 0.0000000e+00 0.0000000e+00 ... 0.0000000e+00
  0.0000000e+00 0.0000000e+00]]

PREDICTIONS[[0.50033516 0.50027233 0.50031793 0.500271   0.50022817 0.5003597
  0.50027865 0.50018543 0.50036544 0.50030404 0.500344   0.50030637
  0.50027204 0.5002804  0.50025326 0.50039023]]
array([[1., 1., 1., 0., 0., 1., 0., 0., 0., 1., 0., 1., 1., 1., 0., 0.]],

The focus should be on the PREDICTION vector which hold values which are near 0.5. that makes me think that my network is confused and cannt decide wether images are similar or not. Do notice one vector below is the 'Ground Truth' I couldnt figure out how to fix it. I tried adding some Dropouts, BatchNorms, Regularization (for last Dense layer's filters in the CNN part).

I have tried so many combinations and yet my model wont learn, and I couldnt think of any more ways that I can make it work.

Any help would be very appreciated.

The Training Procedure looks as in the graph below (Note, there is no validation set yet, although im willing to split the LFW set to get some validations):

Topic siamese-networks one-shot-learning cnn convolutional-neural-network tensorflow

Category Data Science


I had many small implementation problems, mostly on the data generation side. 2 main problems which I remember made a big change:

  1. I have changed "steps_per_epoch" to some big number and chose "epochs =1" which is obviously wrong, since the network should iterate over the whole dataset a couple of times before it gets a decision. If the dataset is big enough, "steps_per_epoch" shouldn't be defined and it will be the batch count of the dataset by default

  2. I have implemented my generator in such way that it will always generate a balanced batch. The important thing to notice is that the whole dataset is balanced, the batches dont really matter (at least wheb speaking of image processing problems).

  3. In this forum my network used "separable conv2d" which is known for being quicker and less memory consumer, but it didnt work for me, and changing those layers to the classic "Conv2D" made a big change

  4. Only one Dense layer doesnt really work that good, and I added some more of those layers with different architectures.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.