How "similarity" is measured in image retrieval?

I know what content based image retireval is. I have read this and this as one of them says: given a query images, get a rank list that are most similar to the query image, based on the content of the query image. But my question is how the similar images are determined. Assume we are working on Oxford5k dataset. The dataset contains 5k images in 17 classes. So, when I feed one of the images as a query, my algorithm returns a list of ranked images. images come in the top of this list consider similar to the query. I know that we do the retrieval by extracting feature and matching them. But, how I can evaluate whether my method finds better than the others.

I am pretty sure that some would say an image is similar to another one provided that they both belong to the same class. But this means we are doing a classification not an image retrieval task. If it is not the case, how I can say method A finds more similar images than method B.

I am aware about Precision, Recall and mAP. But my question is before these metrics. For example, precision is TP/(TP+FP). But how we determine one image is a true positive (just based on class names?)

How we say an image is similar to another one? Considering them as similar images based on belonging to the same class does not make sense.

Any clear explanation would be really appreciated.

UPDATE 1: My question is actually a question about the literature. How computer scientist working on field of image retrieval, consider an image similar to another one. How they measure their methods and claim theirs are better than the state-of-the-art? using precision metric only? Anyone can interpret as he likes to make his method looks better. For example, I can say an image of white bird is similar to an image of airplane as both have wings and white. And this is a correct, they are similar. By this assumption, I can say my method is better than yours. It means there is no standard benchmark. Really? after more than two decades progress in image retrieval, we have no solid standard to measure similarity and comparison technique to compare two method? just a mAP score which I can argue that anyone can cheat and fabricate better results by playing with the ranks.

Topic computer-vision information-retrieval machine-learning

Category Data Science


There is no single best similarity metric, unless a query and found images are near identical.

Similarity is not a universal concept. It is trained. Maybe it happened to you to say that a person A looks like B, but your friend does not see the similarity.

Here is a nice example of image similarity by Flickr Code illustrating the concept. This article collection helps better understand similarity. Here is a nice example of practical solution to find image similarity for fake photos.

I am closely monitoring the topic on Stackexchange, only to observe that many people are trying in vain to find some magic metric. A typical beginner uses SSIM for everything. But the problem is intractable, because of huge number of visual information dimensions, especially semantic ones. One could only imagine what image similarity is for an insect or an octopus.

Deep learning works, because a trained network is an approximation and generalization over training examples. It is like web-search results, which can be good based on click stats from many users, but not good in one specific case.

It is a ranking problem with focus on particular metrics of interest, or common cases.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.