Find optimal feature combinations and ordering for a multi-class clasification problem

We have a multi-class classification problem where the training data looks as follows:

name A B C brand
Snickers Ltd company huge sales Snickers
Acme Intl office stationary commercial Acme
Davidoff cigars big Davidoff
Max Car Company car repair small garage MaxAuto

As can be seen we have one free text feature column(name) and several categorical feature columns that may be empty. Brand has to be predicted. The categorical features have a large (1000+) number of possible values. The above is a sample and we have several more categorical features.

Our domain experts inform us that brand could be predicted based on various combinations: e.g.

(name, A) -- brand or

(name, C) -- brand or

(A,B,C) -- brand

We have a well known list of about 2500 brands that we are interested in. Our training data is comparitively small with only 200k records. So far we have had poor results with Random Forest approach and are open to rule-based classification as well.

Is it possible to come algorithmically determine the best sequence of rules to predict the target?

Topic multiclass-classification association-rules random-forest

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.