Understanding Kneser-Ney Formula for implementation

I am trying to implement this formula in Python

$$ \frac{\text{max}(c_{KN}(w^{i}_{i-n+1} - d), 0)}{c_{KN}(w^{i-1}_{i-n+1})} + \lambda(c_{KN}(w^{i-1}_{i-n+1})\mathbb{P}(c_{KN}(w_{i}|w^{i-1}_{i-n+2})$$

where

$$ \mathrm{c_{KN}}(\cdot) = \begin{cases} \text{count}(\cdot) \text{for the highest order } \\ % is your "\tab"-like command (it's a tab alignment character) \text{continuationcount}(\cdot) \text{otherwise.} \end{cases} $$

Following this link here I was able to understand how to implement the first half of the equation namely

$$\frac{\text{max}(c_{KN}(w^{i}_{i-n+1} - d), 0)}{c_{KN}(w^{i-1}_{i-n+1})} $$

but the second half specifically at the moment the $\lambda(c_{KN}(w^{i-1}_{i-n+1})$ term is confusing me. The author in the link states that

$$\lambda(w_{i-1}) = \frac{d}{c(w_{i-1})}\left|\{w:c(w_{i-1},w)0\}\right|$$

but when we are in the highest order the discounting factor $d$ is zero. The author goes on to say the denominator in the fraction is the frequency of the semifinal word in the special case of a 2-gram, but in the recursion scheme we are developing, we should consider the whole string preceding the final word (well, for the 2-gram case, the semifinal word is the whole string). The term to the right of the fraction means the number (not the frequency) of different final word types succeeding the string.

He proceeds on with the example but it makes no sense to me. I cannot really find any good material to explain how to calculate this $\lambda(c_{KN}(w^{i-1}_{i-n+1})$ term.

Any suggestions or material related to this would be helpful. Recall I am doing this numerically so I need help decomposing this equation so I can code it and thus I need to understand what each term is.

Following the link above, this is a preliminary implementation I came up with

def ksener_ney_smoothing(previous_tokens, ngram_dict, discounting_factor=0.75):
    suggestions = []
    
    # Start with previous_token (user input)
    previous_ngram = tuple(previous_tokens)
    previous_ngram_minus_last_word = tuple(previous_tokens[:len(previous_tokens)-1]) # w
    len_previous_ngram = len(previous_ngram)
    
    # Pull ngrams for the highest order
    highest_order_ngrams_map = ngram_dict[len_previous_ngram]
    second_higest_order_ngrams_map = ngram_dict[len_previous_ngram - 1]
    
    # Check if the user input is in the highest order ngram map
    if previous_ngram in highest_order_ngrams_map:
#         discounting_factor = 0 # From the link if the users input is in the highest order ngrams then its 0, found no where in literature
        first_num = max(highest_order_ngrams_map[previous_ngram] - discounting_factor, 0)
        first_denom = second_higest_order_ngrams_map[previous_ngram_minus_last_word]
#         print(first_num,  , first_denom)
        
        l1 = list(list(x) for x in second_higest_order_ngrams_map.keys())
        lamb_denom = [item for sublist in l1 for item in sublist].count(previous_tokens[-2])
        l2 = list(list(x) for x in highest_order_ngrams_map.keys())
        myset = {item[2] for item in l2 if item[:2] == previous_tokens[:len(previous_tokens)-1]}
        lamb = (discounting_factor/lamb_denom)*len(myset)
#         print(lamb)

        pcont_num = [word[-1] for word in list(ngram_dict_ex[3].keys())].count(previous_ngram[-1])
        pcont_denom = len(highest_order_ngrams_map)
        pcont = pcont_num/pcont_denom
        
        return first_num / first_denom + lamb*pcont
    
    else:
        pass

I was able to match his corpus and numerical examples so should be on the right track, I am just not sure about the recursion part.

Topic mathematics ngrams language-model nlp

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.