How to get sentiment score for a word in a given dataset

I have a sentiment analysis dataset that is labeled in three categories: positive, negative, and neutral. I also have a list of words (mostly nouns), for which I want to calculate the sentiment value, to understand how (positively or negatively) these entities were talked about in the dataset. I have read some online resources like blogs and thought about a couple of approaches for calculating the sentiment score for a particular word X.

  1. Calculate how many data instances (sentences) which have the word X in those, have positive labels, have negative labels, and neutral labels. Then, calculate the weighted average sentiment for that word.

  2. Take a generic untrained BERT architecture, and then train it using the dataset. Then, pass each word from the list to that trained model to get the sentiment scores for the word.

Does any of these approaches make sense? If so, can you suggest some related works that I can look at? If these approaches don't make sense, could you please advise how I can calculate the sentiment score for a word, in a given dataset?

Topic bert sentiment-analysis dataset nlp

Category Data Science


One option would be to use a library like Textblob. It has a function to calculate the sentiment of a sentence and assign polarity score and subjectivity score. So you could also pass a word instead of a sentence and it will calculate the polarity of that word along with the subjectivity.

The 2 options you have suggested might also work (specially the BERT one). Try all the available options and then see which gives the best results.

Cheers!


You could use Integrated Gradients to see which words have led to a positive/negative sentiment and then aggregate their scores over your whole dataset. Integrated Gradients are an easy and good way to understand neural network inference.

I found an article on how to use integrated gradients for sentiment-analysis.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.