Minimum number of features for Naïve Bayes model

I keep on reading that Naive Bayes needs fewer features than many other ML algorithms. But what's the minimum number of features you actually need to get good results (90% accuracy) with a Naive Bayes model? I know there is no objective answer to this -- it depends on your exact features and what in particular you are trying to learn -- but I'm looking for a numerical ballpark answer to this.

I'm asking because I have a dataset with around 280 features and want to understand if this is way too few features to use with Naive Bayes. (I tried running Naive Bayes on my dataset and although I got 86% accuracy, I cannot trust this number as my data is imbalanced and I believe this may be responsible for the high accuracy. I am currently trying to fix this problem.)

In case it's relevant: the exact problem I'm working on is generating time tags for Wikipedia articles. Many times the infobox of a Wikipedia article contains a date. However, many times this date appears in the text of the article but is missing from the infobox. I want to use Naive Bayes to identify which date from all the dates we find in the article's text we should place in the infobox. Every time I find a sentence with a date in it I turn it into a feature vector -- listing what number paragraph I found this in, how many times this particular date appears in the article, etc. I've limited myself to a small subset of Wikipedia articles -- just apple articles -- and as a result, I only have 280 or so features. Any idea if this is enough data?

Thanks!

Topic wikipedia naive-bayes-classifier feature-selection nlp python

Category Data Science


The number of features to get a minimum accuracy is an empirical question, that will depend on the specific problem. There are problems where a single feature will result in "good enough" performance, there are other problems where no number of features is sufficient.

Naive Bayes does provide the most predictive features for each class. So the increase in performance by adding features can be calculated.

For your specific problem, more features (more text signals) and more observations (more of Wikipedia) would increase accuracy.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.