Deciding Initial Weights In A Linear Classifier For Sentiment Analysis

I would like to build a simple sentiment analysis classifier using logistic regression. I downloaded a list of positive and negative words from cs.uic.edu. There are more than 6000 words both positive and negative. Linear Classifier has the form: (Wikipedia Reference)

$$\sum wj*xj$$

where $w$ is the weight of the vector $x$. So for example, if the weight of vector awesome is 3, then in the following sentence:

Food is awesome and music is awesome.

according to the formula, it will become:

$$3 * 2$$

where 3 is the weight of the vector awesome and 2 is the vector itself (denotes the number of times it occurs in a sentence)

My question is how do I decide the coefficients to start with? Will it be a manual process? There are more than 6000 words. What is the way to approach this?

Topic machine-learning-model logistic-regression sentiment-analysis classification machine-learning

Category Data Science


If you are adding up the occurrences of positive or negative words to predict sentiment, there is no reason to build a machine learning model.

In order to build a logistic regression model, you need labeled data. It is not clear what the labels you are using for your problem.

The initial weights for a model depend on the optimization technique. Logistic regression is often optimized with gradient descent. If gradient descent is used, the initial weights are random. Then the model learns how to adjust the weight to minimize errors on the target.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.