Perplexed by perplexity
I've seen 2 definitions of the perplexity metric: $PP = 2^{H(p)}$ and $PP = 2^{H(p, q)}$ If I'm understanding correctly, the first one only tells us about how confident the model is about its predictions, while the second one reflects the accuracy/correctness of the model's predictions. Am I correct? Which one do people actually refer to when they claim their language model achieved X perplexity in their papers?
Topic:
perplexity
nlp
Category:
Data Science