Keywords extraction for business rule text classification

I would like to classify texts without using any ML model. My idea is to find a list of keywords that I would assign to each class. Then when I need to classify a new text, I can compare it with my list of keywords and count how many keywords for each class are in the text; the class with the most corresponding keywords would be my final prediction.

Example of classification for this list of keywords:

green : A
red : B
apple : A
car : C

The sentence A green apple in a car is classified as A.
(Points = A : 2, B : 0, C : 1)

The question is what are good techniques for me to explore in order to build my keyword list based on thousands of different text pieces and ~5 classes ? Most keywords algos I found (RAKE,...) are focused on extracting keywords from one text which is totally not my goal.

It would be a good 'baseline' algo for me to then compare results with more advanced ML classification techniques for my study.

Topic word text classification

Category Data Science


You should probably consider a simple case of conditional probabilities - for example, a Naive Bayes Classifier. Assuming for example that you are using Python, you could look up an example of "Naive Bayes spam classifier" - in your example, you would need to rely on 5 cases, instead of the default 2 that spam engines rely on. An example can be found here: https://www.kdnuggets.com/2020/07/spam-filter-python-naive-bayes-scratch.html

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.