Interpolation in nlp - definition of O term

Question

Interpolation in nlp - definition of O term

thepen

2019年4月18日 15:45

Reading definition of interpolation below how are the O terms defined? Is this a value that is set manually?

Example

P( Sam | I am ) = count( Sam I am ) / count(I am) = 1 / 2

Interpolation using N-grams

We can combine knowledge from each of our n-grams by using interpolation.

E.g. assuming we have calculated unigram, bigram, and trigram probabilities, we can do:

P ( Sam | I am ) = Θ₁ x P( Sam ) + Θ₂ x P( Sam | am ) + Θ₃ x P( Sam | I am )

_{(original problem statement image)}

Topic interpolation nlp

Category Data Science

iacob · Accepted Answer · 2019年4月18日 12:44

Terminology point: those symbols aren't O, but thetas Θ.

Confusingly, these values labelled theta are normally referred to as lambdas, as in the page you quote. They are weights used in interpolation (as opposed to backoff) that sum to 1, and can be calculated from the corpus itself via a variety of methods:

How are these λ values set? Both the simple interpolation and conditional interpolation λs are learned from a held-out corpus. A held-out corpus is an additional training corpus that we use to set hyperparameters like these λ values, by choosing the λ values that maximize the likelihood of the held-out corpus. That is, we fix the N-gram probabilities and then search for the λ values that when plugged into Eq. 4.24 give us the highest probability of the held-out set. There are various ways to find this optimal set of λs. One way is to use the EM algorithm defined in Chapter 7, which is an iterative learning algorithm that converges on locally optimal λs (Jelinek and Mercer, 1980).

Speech and Language Processing. Chapter 4: N-Grams (4.4.3 Backoff and Interpolation) (p.15)

Cody Greco · Accepted Answer · 2019年4月18日 12:28

1

Cody Greco answered at 2019年4月18日 12:28

You would treat the thetas like probabilities. So they must be greater than 0 and they must sum to 1. Technically, you set them manually but you can find the optimal values for theta using a variety of methods.

Interpolation in nlp - definition of O term

Example

Interpolation using N-grams

About