Perplexed by perplexity
I've seen 2 definitions of the perplexity metric:
$PP = 2^{H(p)}$
and
$PP = 2^{H(p, q)}$
If I'm understanding correctly, the first one only tells us about how confident the model is about its predictions, while the second one reflects the accuracy/correctness of the model's predictions. Am I correct?
Which one do people actually refer to when they claim their language model achieved X perplexity in their papers?
Topic perplexity nlp
Category Data Science