How are the weights defined in a (linear-chain) Conditional Random Field?
Edit: i saw that i mixed up i (in the graph) and t (in the formula), in the following i equivalent to t
I am trying to understand the theory behind linear chain Conditional Random Fields. I have now read An Introduction to Conditional Random Fields by McCallum and Sutton, I think McCallum is one of the inventors of CRFs. In this work you can find the following representation of the CRF as a graph (I added some annotations to better explain my question):
As I understood it and as I know it from other works, the black squares on the connections are the weights that are learned by the training, right? The formula for linear chain CRFs is defined as:
Where Theta is the weight of the respective features function. It is unclear to me how Theta is defined, is Theta in the graph above represented by one of the black squares or a combination of them, for example for I a combination of w_2 and w_3, since the features function for i is f(X_i, Y_i, Y_i-1)? Or is theta not represented at all in the above graph?
Topic probability rnn naive-bayes-classifier nlp machine-learning
Category Data Science