Training data in sentiment analysis

Question

Training data in sentiment analysis

Dan Jírovec

2022年5月7日 16:29

I'm doing sentiment analysis of tweets related to recent acquisition of Twitter by Elon Musk. I have a corpus of 10 000 tweets and I'd like to use machine learning methods using models like SVM and Linear Regression. My question is, when I want to train the models, do I have to manually tag big portion of those 10 000 collected tweets with either positive or negative class to train the model correctly or can I use some other dataset of tweets not relating to this topic that's already tagged to train the model for sentiment analysis? Thank you for your answers!

Topic linear-regression sentiment-analysis svm

Category Data Science

Gius · Accepted Answer · 2022年5月7日 16:29

If you train a model, you train it to make it work in a more general situation (e.g. when you use the test set, unseen data, to evaluate your model, you just compute what is called generalization error).

You don't train a model to work only with data you trained it with, but to work good with unseen data (else it means you overfitted, so the model is useless).

So you can train a model for sentiment analysis on some tweets dataset (you can find a lot of them online, with all labels, so you can compute metrics with this data to make sure it works), and then use this model to make predictions on your own data. It will obviously act as an unsupervised task (I mean over your 10k tweets), since you haven't labels (so you can't compute metrics), but if the model was trained in the right way, it will works.

Training data in sentiment analysis

About