I can't figure out how to improve accuracy for tweet sentiment

I'm doing a beginning attempt at tweet sentiment analysis (positive, neutral, negative). So far I have cleaned the data and used a BoW to get some feeling of the data (2.5k tweets). I also made bigrams to try to get clearer sentiment insight.

The data is severely skewed so I tried both upsampling and downsampling to view the difference.

I finally passed it all through a Random Forest Classifier and I get an accuracy of 0.7 for the upsampled data and 0.3 for the downsampled one.

I visualized this in a confusion matrix and I can see that the model sucks at actually labeling correctly. I retrieved Precision, Recall, and F1. I can see I have problems with the positive and negative sentiments above all (values are 0.45)

I have tried going back to cleaning the data but at this point, I can't think of anything else to do to it (I've run stemming, lemma, tokenize, stopwords and added stopwords that were left there, removed special characters (@, #, etc), and hyperlinks.

I also gave my countvectorizer a range of ngrams of 1,1; 2,2; 3,3; but no big change is detected.

This is my first time doing this, can anybody point me in the right direction here please?

Topic beginner random-forest sentiment-analysis

Category Data Science


Apparently what you've done so far is to apply various standard techniques by trial and error. I'd suggest you could investigate what happens with your models and your data by doing a manual error analysis. Take a random sample of instances which are incorrectly predicted and try to understand why this happens. Errors can can happen for various reasons:

  • The gold label is wrong or questionable, sentiment can be subjective. A tweet considered neutral by A could be negative for B. If the same kind of tweet has different labels, the model cannot find good patterns to predict the label. If this kind of problem is common then the gold data is low quality and there's not much to do with it.
  • The model doesn't catch some clues that are obvious to a human. Identify what are these clues and why the model fails to catch them. This way you can design features in a more informed way in order to improve performance (feature engineering).

But first you should probably avoid resampling, and check for overfitting: the performance on the test set should not be much lower than on the training set. When you use complex features like all the n-grams, it's likely that you cause the model to overfit.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.