Approximation of a confidence scores from a neural network with a final softmax layer: Softmax vs other normalization methods

Say that there is a neural network for classification and the 2nd to last layer are 3 nodes, and the final layer is a softmax layer.

During training the softmax layer is needed, but for inference it is not; the arg max can simply be taken from the 3 nodes.

What about for getting some sort of approximation for confidence from the neural network? Using the softmax for normalization makes less sense, since it gives a ton of weight to the largest value among the final 3 nodes, which I can see is useful for training, but for inference this seems like it would distort its use as an approximation for a confidence score.

Would a different normalization method give a better confidence score? Perhaps simply dividing each node output by the total sum of all node outputs?

Topic confidence softmax probability machine-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.