Interpolation in nlp - definition of O term

Reading definition of interpolation below how are the O terms defined? Is this a value that is set manually?

Example

  • P( Sam | I am ) = count( Sam I am ) / count(I am) = 1 / 2

Interpolation using N-grams

We can combine knowledge from each of our n-grams by using interpolation.

E.g. assuming we have calculated unigram, bigram, and trigram probabilities, we can do:

P ( Sam | I am ) = Θ1 x P( Sam ) + Θ2 x P( Sam | am ) + Θ3 x P( Sam | I am )

(original problem statement image)

Topic interpolation nlp

Category Data Science


Terminology point: those symbols aren't O, but thetas Θ.

Confusingly, these values labelled theta are normally referred to as lambdas, as in the page you quote. They are weights used in interpolation (as opposed to backoff) that sum to 1, and can be calculated from the corpus itself via a variety of methods:

How are these λ values set? Both the simple interpolation and conditional interpolation λs are learned from a held-out corpus. A held-out corpus is an additional training corpus that we use to set hyperparameters like these λ values, by choosing the λ values that maximize the likelihood of the held-out corpus. That is, we fix the N-gram probabilities and then search for the λ values that when plugged into Eq. 4.24 give us the highest probability of the held-out set. There are various ways to find this optimal set of λs. One way is to use the EM algorithm defined in Chapter 7, which is an iterative learning algorithm that converges on locally optimal λs (Jelinek and Mercer, 1980).


You would treat the thetas like probabilities. So they must be greater than 0 and they must sum to 1. Technically, you set them manually but you can find the optimal values for theta using a variety of methods.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.