What to do if a specific label of a category appears only a few times?
Let's say I am trying to predict whether a car will be auctioned or not (not what I'm actually trying to do, but it represents it pretty well) using tabular data. I have the year the car was made, its color, model, etc. The model is the name of a car(e.g: Sportage, Mazda3, etc.) and some of the more famous models such as Sportage appear many times whereas some of the less popular ones might appear only once or twice. In that case, what would be the ideal way to deal with this?
More info:
In my case, I have about 3000 different car models and the first two or three make up about 20% of my data but the rest just appear once or twice in the entire dataset. I have tried one-hot encoding and that did increase my score immensely but it's still not good enough (I know as a matter of fact it could be better).
P.S: I have already looked at the posts regarding a high cardinality and although I do think it's related to my problem, it's still a different issue.
Thank you so much!
Topic categorical-encoding data classification dataset categorical-data
Category Data Science