Compare cross validation values of Bernoulli NB and Multinomial NB
I'm testing the Multinomial NB and Bernoulli NB on my dataset and I'm using the cross validation score to better understand which of the two algorithms work better. This is the first classifier:
from sklearn.naive_bayes import MultinomialNB
clf_multinomial = MultinomialNB()
clf_multinomial.fit(X_train, y_train)
y_predicted = clf_multinomial.predict(X_test)
score = clf_multinomial.score(X_test, y_test)
scores = cross_val_score(clf_multinomial, X_train, y_train, cv=5)
print(scores)
print(score)
And these are the scores:
[0.75 0.875 0.66666667 0.95833333 0.86956522]
0.8637666498061035
This is the second classifier:
from sklearn.naive_bayes import BernoulliNB
clf_multivariate = BernoulliNB()
clf_multivariate.fit(X_train, y_train)
y_predicted = clf_multivariate.predict(X_test)
score = clf_multivariate.score(X_test, y_test)
scores = cross_val_score(clf_multivariate, X_train, y_train, cv=5)
print(scores)
print(score)
And these are the scores:
[0.5 0.5 0.54166667 0.54166667 0.52173913]
0.5
From what I understood from the answer posted here, the first classifier works better because my dataset has lots of features (11k) instead of just 1. However, it's pretty strange that I got 0.5 in the second classifier which is an high value considering the number of features. What are the differences between the classifiers?
Topic probability cross-validation classification python
Category Data Science