Sentiment Analysis Label Distribution
I am working on Sentiment Analysis
model.
The dataset I have has three labels: positive
, negative
and neutral
.
But the problem is the data is not equal for labels. Say out of 100K : 75 K are neutral, 15K positive and 10K negative.
I wanted to know whether it is necessary to choose equal distribution of labels while training or I can go ahead with unequal data and if so till what extent? Are there any ways to deal with such problem?
Topic sentiment-analysis
Category Data Science