Independence of Features assumption in Naive Bayes

How do we know if your features in my dataset are independent before applying Naive Bayes? Basically I want to know is it possible for us to get an idea before training our model if Naive Bayes will give decent results on it.

Topic naive-bayes-algorithim probability naive-bayes-classifier

Category Data Science


You could try computing Mutual Information between the features (sklearn can do it).

You could estimate Pearson's or Spearman's correlation coefficients

You could try training similar models to predict one feature given another, and use accuracy to determine.


Statistical independence is a pretty straightforward thing. If $$p(A\cap B) = p(A) p(B)$$ then $A$ and $B$ are independent (in other words if marginal distributions are equal to conditional). If you want, you could even check that on your data. Though it would be easier to check: $$p(A|B) = p(A) \ \text{and} \ p(B|A) = p(B)$$ instead of constructing a joint distribution. The latter is easy, if your features are categorical then you could estimate $p(A)$, $p(B)$ , $p(A|B)$, $P(B|A)$ as sample frequencies. If one of A or B is categorical computations are also simple. If both A and B are numeric, you need to fit a KDE (kernel density estimation) model to all probability distributions.

However, on practice it is simpler and faster just to fit a Naive Bayes and check its performance on a test set.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.