How does the Naive Bayes algorithm function effectively as a classifier, despite the assumptions of conditional indpendence and bag of words?

Naive Bayes algorithm used for text classification relies on 2 assumptions to make it computationally speedy:

  • Bag of Words assumption: the position of words is not considered

  • Conditional Independence: words are independent of one another

In reality, neither of those conditions often holds, yet Naive Bayes is quite effective. Why is that?

Topic naive-bayes-algorithim naive-bayes-classifier

Category Data Science


The main reason is that in many cases (but not always) the model obtains enough evidence to make the right decision just from knowing which words appear and don't appear in the document (possibly also using their frequency, but this is not always needed either).

Let's take the textbook example of topic detection from news documents. A 'sports' article is likely to contain at least a few words which are unambiguously related to sports, and the same holds for many topic as long as the topics are sufficiently distinct.

In general tasks which are related to the general semantics of the text work reasonably well with unigrams (single words, unordered) as features, whether with NB or other methods. It's different for tasks which require taking syntax into account, or which require a deeper understanding of the semantics.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.