Very low probability in naive Bayes classifier 1

Question

Very low probability in naive Bayes classifier 1

R. Cox

2022年6月3日 15:04

I have some training data (TRAIN) and some test data (TEST). Each row of each table contains an observed class (X) and some columns of binary (Y). I'm using a Python script that is intended to predict the probability (Pr) of X given Y in the test data based on the training data. It uses a Bernoulli naive Bayes classifier. Here is my script:

https://stackoverflow.com/questions/55187516/look-up-bernoullinb-probability-in-dataframe

It works on the dummy data that is included with the script.

On the real data, I know from experience which class some of the Y columns are indicative of. My script however is giving probability predictions like "1" where I don't think that the class is correct and "6e-77" on correct classes.

Any advice on what I can try please?

Edit

There are two problems. The very low probability is caused by the naive assumption that nothing is related to anything else. This is described here: https://scikit-learn.org/stable/auto_examples/calibration/plot_calibration_curve.html#sphx-glr-auto-examples-calibration-plot-calibration-curve-py

The incorrect answers are caused by my code getting confused about which class is which, as described on my Stack Overflow post.

Topic prediction probability naive-bayes-classifier machine-learning

Category Data Science

R. Cox · Accepted Answer · 2019年4月30日 13:41

Each column of binary (Y) is a feature. The Bernoulli naive Bayes classifier could identify the class (X) where the number of features (Y) was less than 17. The real data had more features than that. I found that another method could classify it accurately. That was:

Trainining:

(1) Count which features (Y) are in each class (X) in the training data

Testing:

(2) Give each row a score (Z) with a starting value of 0.5

(3) For each row:

If each feature (Y) is in the class (X) in the training data then add 1 to the score (Z).
If each feature (Y) is not in the class (X) in the training data then subtract 1 from the score (Z).
If the class (X) is not in the training data then don't do anything

The score (Z) was a good classifier for my data.

Very low probability in naive Bayes classifier 1

About