Weighting of words in lexicon based sentiment analysis

I have a a question regarding my current project, i am trying to do a lexicon based sentiment analysis on my data, where i calculate the sentiment score as following:

$$ Score = \frac{\sum_{i}{word_i}}{\mid words \mid} $$

So according to the score the word will be classified in either negative or positive. But i have also calculated for every word in the article its salience and frequency and would like to know if its possible to use them in my sentiment analysis formula above.

 words| salience| frequence
 sad    0.8       3
 happy  0.5       2

Topic nltk sentiment-analysis nlp

Category Data Science


Yes, you can. Not quite sure what else to add. Your formula can then look like:

$$ Score = \sum_{i}{f(salience_i, frequency_i, sentiment_i)}$$

Where $f$ is a function that weighs your sentiment score with the salience and frequency. Up to you to define how.

  • What if you don't know which $f$ to use?

Now, bear with me, this isn't something I've tried per se, but this could be an interesting approach. You could use a recurrent neural network and your input could be the salience, frequency, and sentiment score for each word. Not only will your RNN "create" (ideally) the best $f$ for your particular problem, but it will also use the sequential information of the words, which may even improve your results.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.