how to filter out and discard irrelevant tweets in simplest way possible

I have lot of tweets and from which i need to filter out and discard irrelevant tweets. the criteria for a tweet to be irrelevant is very simple. if all that a tweet has is emojis or a single hastag or multiple hastags etc,. put simply, if a tweet contains no actual information to extract, that's irrelevant. are there any pre-built packages available.

I don't want to build a classifier, because this is going to be used inside of the data pre-processing pipeline of an NLP model. Moreover, labelling the tweets will be an additional overhead. So, i want to know if there are any approaches or pre-trained models to do this. And I would like this thing to be as simple as possible.

Topic text-classification twitter text-mining nlp data-cleaning

Category Data Science


I think simple regex matching is all you need.

Pass the tweet into a series of regular expressions that match emojis and hashtags and if nothing remains, discard.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.