State-of-the-art Python packages that can evaluate language similarity
I am trying to evaluate the likelihood of generating a specific sentence out of a large set of sentences. To do this, I start from a simple approach: training a custom n-gram language model and calculating the perplexity values for a list of sentences.
I found that the package KenLM (https://www.aclweb.org/anthology/W11-2123/) was often used to do this task. However, it's kind of old (published in 2011).
On the other hand, I noticed that the two most famous state-of-the-art NLP packages, BERT and GPT-2, are both about pre-trained models.
I wonder if there is any package newer than KenLM suitable for this kind of likelihood evaluation task.
Topic language-model nlp similarity
Category Data Science