Confused on Naive Bayes classifier

In the last part of Andrew Ng's lectures about Gaussian Discriminant Analysis and Naive Bayes Classifier, I am confused as to how Andrew Ng derived $(2^n) - 1$ features for Naive Bayes Classifier.

First off, what does he mean by features in the context he was describing? I initially thought that the features were characteristics of our random vector, $x$. I know that for the total possibilities of $x$ it is $2^n$ but I do not understand how he was able to get $(2^n)-1$ and picture it in my head. I generally understand that for Naive Bayes, it's a more simpler way of calculating the conditional probability, but I just want to understand a bit more.

For reference: https://www.youtube.com/watch?v=nt63k3bfXS0 Go to 1:09:00

Topic mathematics bayesian gaussian naive-bayes-classifier machine-learning

Category Data Science


you are looking for join distribution which means probability of all events at the end sums up to $1$. So you only need $n-1$ of them to be known (which is called independent variable) as the last will be one minus them.

In the context of the video which is classification of text i.e. conditional probability of any combination of vocabulary (here 10k words) given the text is spam or not, knowing all probabilities except one ($2^n$-1) is enough because the probability of remaining one is determined and it is 1 minus sum of all other probabilities.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.