Independence of Features assumption in Naive Bayes

Question

Independence of Features assumption in Naive Bayes

smsubham

2021年5月17日 15:23

How do we know if your features in my dataset are independent before applying Naive Bayes? Basically I want to know is it possible for us to get an idea before training our model if Naive Bayes will give decent results on it.

Topic naive-bayes-algorithim probability naive-bayes-classifier

Category Data Science

Cryo · Accepted Answer · 2021年5月17日 15:23

You could try computing Mutual Information between the features (sklearn can do it).

You could estimate Pearson's or Spearman's correlation coefficients

You could try training similar models to predict one feature given another, and use accuracy to determine.

Anvar Kurmukov · Accepted Answer · 2021年5月14日 17:11

Statistical independence is a pretty straightforward thing. If $$p(A\cap B) = p(A) p(B)$$ then $A$ and $B$ are independent (in other words if marginal distributions are equal to conditional). If you want, you could even check that on your data. Though it would be easier to check: $$p(A|B) = p(A) \ \text{and} \ p(B|A) = p(B)$$ instead of constructing a joint distribution. The latter is easy, if your features are categorical then you could estimate $p(A)$, $p(B)$ , $p(A|B)$, $P(B|A)$ as sample frequencies. If one of A or B is categorical computations are also simple. If both A and B are numeric, you need to fit a KDE (kernel density estimation) model to all probability distributions.

However, on practice it is simpler and faster just to fit a Naive Bayes and check its performance on a test set.

Independence of Features assumption in Naive Bayes

About