What is "Interpolated Absolute Discounting" smoothing method

Question

What is "Interpolated Absolute Discounting" smoothing method

Ahmad

2022年2月11日 20:02

I'm asked to implement "Interpolated Absolute Discounting" for a bigram language model for a text. First, I don't know what is it exactly. I guess it is an interpolation between different ngrams (unigram, bigram, ), whose parameters needs to be learned

Second, what is the implemented probability distribution for this technique in nltk package?

Moreover, I must learn the parameters from a corpus. How can I do that?

Topic nltk

Category Data Science

OmG · Accepted Answer · 2019年3月4日 08:50

It might be back to the Kneser–Ney smoothing (which is used absolute discounting). And you can find it in Kneser-Ney probability distribution using the following code as an example (from this post):

from nltk.util import ngrams
from nltk.corpus import gutenberg

gut_ngrams = ( ngram for sent in gutenberg.sents() for ngram in ngrams(sent, 3, pad_left = True, pad_right = True, right_pad_symbol='EOS', left_pad_symbol="BOS"))
freq_dist = nltk.FreqDist(gut_ngrams)
kneser_ney = nltk.KneserNeyProbDist(freq_dist)

What is "Interpolated Absolute Discounting" smoothing method

About