If a categorical feature only occurs a few times in a data set, should I drop it?
I have a data set of mostly categorical variables. When I one-hot encoded them some of the features occur less than 3% of the time.
For instance the Tech-support feature only occurs 928 times in a data set with 32561 samples ie. it only occurs 2.9% of the time.
Is there a general cutoff point for when I should scrap these variables? I'm cleaning up this data set for binary logistic regression and an SVM.
Thank you!
Topic features one-hot-encoding logistic-regression svm
Category Data Science