Approximation of a confidence scores from a neural network with a final softmax layer: Softmax vs other normalization methods
Say that there is a neural network for classification and the 2nd to last layer are 3 nodes, and the final layer is a softmax layer.
During training the softmax layer is needed, but for inference it is not; the arg max can simply be taken from the 3 nodes.
What about for getting some sort of approximation for confidence from the neural network? Using the softmax for normalization makes less sense, since it gives a ton of weight to the largest value among the final 3 nodes, which I can see is useful for training, but for inference this seems like it would distort its use as an approximation for a confidence score.
Would a different normalization method give a better confidence score? Perhaps simply dividing each node output by the total sum of all node outputs?
Topic confidence softmax probability machine-learning
Category Data Science