Why is a general/original softmax loss not preferred in FR (face recognition)?
In some papers I've read that softmax loss is not preferred in FR since it does not give a good inter-class and intra-class margins, but could not understand 'why?'. So can someone explain, why softmax loss is not preferred in FR, in both mathematically and theoretically.
Topic cnn image-recognition object-recognition deep-learning
Category Data Science