Why is a general/original softmax loss not preferred in FR (face recognition)?

In some papers I've read that softmax loss is not preferred in FR since it does not give a good inter-class and intra-class margins, but could not understand 'why?'. So can someone explain, why softmax loss is not preferred in FR, in both mathematically and theoretically.

Topic cnn image-recognition object-recognition deep-learning

Category Data Science


Disadvantage of softmax loss is written in Your referenced paper.

"ArcFace" (arxiv.org/pdf/1801.07698.pdf) and "Face recognition via centralized coordinate learning" https://arxiv.org/pdf/1801.05678.pdf

(1) the size of the linear transformation matrix W ∈ Rd×n increases linearly with the identities number n;

  • there are millions of identities in the training data. Complexity will grow too much.

(2) the learned features are separable for the closed-set classification problem but not discriminative enough for the open-set face recognition problem.

  • In an open-set problem, unknown classes may occur in the test stage. In a close-set problem, all test classes are known in the training stage. face recognition is open-set problem.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.