N-Gram Linear Smoothing
In slide 61 of the NLP text, to smooth out the n-gram probabilities, we need to find the lambdas the miximazies a probability to held-out set given in terms of M(λ1, λ2, ...λ_k). What does this notation mean please? Also, it says that "One way is to use the EM algorithm, an iterative learning algorithm that converges on locally optimal λs". Can someone refer me to a good example? Say the training text is "Sam I am Sam I do not eat." and the held-out data is "I do like Sam I am sam"
What is the objective function please?
Topic expectation-maximization nlp
Category Data Science