How to make use of POS tags as useful features for a NaiveBayesClassifier for sentiment analysis?

I'm doing sentiment analysis on a twitter dataset (problem link). I have extracted the POS tags from the tweets and created tfidf vectors from the POS tags and used them as a feature (got accuracy of 65%). But I think, we can achieve a lot more with POS tags since they help to distinguish how a word is being used within the scope of a phrase. The model I'm training is MultnomialNB().

The problem I'm trying to solve is to find the sentiments of tweets like positive, negative or neutral.

Structure of datset:

Created pos tags:

I created tfidf vectors from the tweet and gave the inputs to my model:

tfidf_vectorizer1 = TfidfVectorizer(
    max_features=5000, min_df=2, max_df=0.9, ngram_range=(1,2))
train_pos = tfidf_vectorizer1.fit_transform(train_data['pos'])
test_pos = tfidf_vectorizer1.transform(test_data['pos'])

clf = MultinomialNB(alpha=0.1).fit(train_pos, train_labels)
predicted = clf.predict(test_pos)

With the above code I got 65% accuracy. Rather than creating TF-IDF vectors of POS and using them as modal inputs. I'm wondering is there any other way that we can use POS tags to increase the accuracy of the model?

Topic naive-bayes-classifier sentiment-analysis nlp machine-learning

Category Data Science


There are so many ways you could go about this. For starters, you could use Conditional Random Fields (CRF). There is a sweet implementation in Python. In which you can set the POS features and more. There is a website from the same source you posted on how to use CRF for your purpose (I have not read it thoroughly). Spacy is another great resource to get all the features that you need fast. Nonetheless, for SOTA you will need some NN implementations.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.