Deep Learning - Find most similar images - Triplets vs Pairs

I am working with Python, scikit-learn, keras and with 450x540 rgb images of front-faced watches (e.g. Watch_1, Watch_2).

My aim to run an autoencoder or a Siemese Neural Network to find the most similar watches among them. However, I am not sure if I will get better results by comparing pairs of images or triplets of images. As it is defined in this research paper, triplets of images consist of one target image, one image which is (more) similar to the target image and one image which is not (so) similar to it.

Can someone explain me in simple terms why using triplets of images will (necessarily) yield better results than using pair images, as research papers like the previous one claim?

Topic deep-learning neural-network python similarity

Category Data Science


Loss functions work best when there are clear definitions of correct and incorrect. If everything is correct, there is no signal for training. That is means the end of training. Loss functions need weighted errors in order to have something to minimize. In the case of image similarity, the weighted error is distance between dissimilar images.

Image triplets are more useful because they contain more signal for training a loss function compared to just image pairs.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.