Deciding Initial Weights In A Linear Classifier For Sentiment Analysis
I would like to build a simple sentiment analysis classifier using logistic regression. I downloaded a list of positive and negative words from cs.uic.edu. There are more than 6000 words both positive and negative. Linear Classifier has the form: (Wikipedia Reference)
$$\sum wj*xj$$
where $w$ is the weight of the vector $x$. So for example, if the weight of vector awesome
is 3
, then in the following sentence:
Food is awesome and music is awesome.
according to the formula, it will become:
$$3 * 2$$
where 3 is the weight of the vector awesome
and 2 is the vector itself (denotes the number of times it occurs in a sentence)
My question is how do I decide the coefficients to start with? Will it be a manual process? There are more than 6000 words. What is the way to approach this?