Sentiment data for Emoji

For experimenting we'd like to use the Emoji embedded in many Tweets as a ground truth/training data for simple quantitative senitment analysis. Tweets are usually too unstructured for NLP to work well.

Anyway, there are 722 Emoji in Unicode 6.0, and probably another 250 will be added in Unicode 7.0.

Is there a database (like e.g. SentiWordNet) that contains sentiment annotations for them?

(Note that SentiWordNet does allow for ambiguous meanings, too. Consider e.g. funny, which is not just positive: "this tastes funny" is probably not positive... same will hold for ;-) for example. But I don't think this is harder for Emoji than it is for regular words...)

Also, if you have experience with using them for sentiment analysis, I'd be interested to hear.

Topic classification parsing machine-learning

Category Data Science


I found this Github repo useful (a good start). List of emoji rated for valence with an integer between minus five (negative) and plus five (positive).

See list of supported unicode-emojis.

Note that some emoji receive arguably confusing polarities, such as stuck_out_tongue_closed_eyes (0), due to being used for both positive and negative emotions.


Total of 972 emoji is not really that big not to be able to label them manually, but I doubt that they will work as a good ground truth. Sources like Twitter are full of irony, sarcasm and other tricky settings where emotional symbols (such as emoji or emoticon) mean something different from normal interpretation. For example, someone may write "xxx cheated their clients, and now they are cheated themselves! ha ha ha! :D". This is definitely negative comment, but author is glad to see xxx company in trouble and thus adds positive emoticon. These cases are not that frequent, but definitely not suitable for ground truth.

Much more common approach is to use emoticon as a seed for collecting actual data set. For example, in this paper authors use emoticon and emotional hash tags to grab lexicon of words useful for further classification.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.