Naive Bayes Predict type = 'raw' returning NA

Question

Naive Bayes Predict type = 'raw' returning NA

Teja

2022年5月6日 20:07

I have build a naive bayes model for text classification.It is predicting correctly.But it is returning 'NA' in prediction results if i put 'type = raw'.i have seen some results in stackoverflow to add some noise.when i do that i am getting all A category as 0's and all B category as 1's.How can i get correct probabilities in naive bayes?

library('tm');
library('e1071');
library('SparseM');
Sample_data - read.csv("products.csv");
traindata - as.data.frame(Sample_data[1:60,c(1,2)]);
testdata - as.data.frame(Sample_data[61:80,c(1,2)]);
trainvector - as.vector(traindata$Description);
testvector - as.vector(testdata$Description);
trainsource - VectorSource(trainvector);
testsource - VectorSource(testvector);
traincorpus - Corpus(trainsource);
testcorpus - Corpus(testsource);
traincorpus - tm_map(traincorpus,stripWhitespace);
 traincorpus - tm_map(traincorpus,tolower);
 traincorpus - tm_map(traincorpus, removeWords,stopwords("english"));
traincorpus- tm_map(traincorpus,removePunctuation);
 testcorpus - tm_map(testcorpus,stripWhitespace);
 testcorpus - tm_map(testcorpus,tolower);
 testcorpus - tm_map(testcorpus, removeWords,stopwords("english"));
 testcorpus- tm_map(testcorpus,removePunctuation);
trainmatrix - t(TermDocumentMatrix(traincorpus));
testmatrix - t(TermDocumentMatrix(testcorpus));
model - naiveBayes(as.matrix(trainmatrix),as.factor(traindata$Group));
results - predict(model,as.matrix(testmatrix))

Topic naive-bayes-classifier r machine-learning

Category Data Science

gchaks · Accepted Answer · 2017年8月2日 20:50

I am assuming that you are referring to this Stackoverflow post that mentions to add noise to the data since the error seems to be coming when there is one (or small) instance of a class in the dataset. Is that the case with the training data? If what you're trying to predict is a rare-event, then a suggestion might be to balance the training data by oversampling the rare class (hence adding noise).

Provided the above is not working, another suggestion is to remove infrequent terms in your term-document-matrix using the function removeSparseTerms.

Going beyond, given the amount of training data you have, it would be good to evaluate if the term document matrix with the words it contains or frequency of specific words is sufficient to differentiate the classes. If not, you should consider adding new features to describe the dataset.

Few suggestions:

count of positive/negative words or a sentiment index that ranges from -1 to 1, if relevant for your data
types of words in dataset (index or count of adjectives or nouns or verbs), again depending on your problem & data
rather than using term-document-matrix, try noun-phrases

Finally, I'm assuming that your test data contains records for both classes. If not, it is difficult to evaluate the model.

Hope that helps. If you could formulate your question more clearly with the data problem and provide some examples of the data, that would help.

Naive Bayes Predict type = 'raw' returning NA

About