what is the training phase in N-gram model?

Question

what is the training phase in N-gram model?

black sheep 369

2021年2月20日 12:50

Following is my understanding of N gram model used in text prediction case :

Given a sentence say, I love my (say N = 1 /bigram), using N gram and say 4 possible candidates ( country, family, wife, school) I can estimate the conditional probability on each of the candidates and take the one with highest probability as the next word.

Question :

I understand the probability part of the model but to even get to the probability, we need the possible candidates ( next words, in this case family, wife, school, country). How does the model choose the candidates
Most of the articles online talk about the probability part but doesn't mention anything about training phase. What exactly is happening in training phase of this model?

Topic ngrams nlp

Category Data Science

Amit Keinan · Accepted Answer · 2021年2月20日 12:50

I will start with an advice - just google "n gram language model" and you will find a lot of good detailed explanations.

With that being said I will give a short explanation about the "training phase" of n-gram language models (answer to question 2). The simplest way to build an N-gram language model strats with finding a big corpus - a set of many sentences. the words of the model will be the words that appear at least once in the corpus. The probability of the word xn given a past context of the words x1,x2,...,xn-1 will be the number of occurrences of the sequence x1,x2,...xn-1,xn in the corpus / the number of occurrences of the sequence x1,x2,...,xn-1 in the corpus.

This is the simplest way and it has problems, especially What happens if the sequence x1,x2,...,xn does not appear in the corpus? It will always get propability zero. Therefore there are smoothing techniques to handle this problem (read about it).

And now for question 1 - In the simplest case, without smoothing, the candidates are the words that appear in the corpus. In models with smoothing the candidates may be all of the words in the sense that every word might get a positive probability.

what is the training phase in N-gram model?

About