How to calculate true positive, true negative, false positive, negative and postive with Bayes Classifer from scratch

I am working on implementing a Naive Bayes Classification algorithm. I have a method def prob_continous_value which is supposed to return the probability density function for an attribute given a class attribute. The problem requires classifying the following datasets: Venue,color,Model,Category,Location,weight,Veriety,Material,Volume 1,6,4,4,4,1,1,1,6 2,5,4,4,4,2,6,1,1 1,6,2,1,4,1,4,2,4 1,6,2,1,4,1,2,1,2 2,6,5,5,5,2,2,1,2 1,5,4,4,4,1,6,2,2 1,3,3,3,3,1,6,2,2 1,5,2,1,1,1,2,1,2 1,4,4,4,1,1,5,3,6 1,4,4,4,4,1,6,4,6 2,5,4,4,4,2,4,4,1 2,4,3,3,3,2,1,1,1 Venue,color,Model,Category,Location,weight,Veriety,Material,Volume 2,6,4,4,4,2,2,1,1 1,2,4,4,4,1,6,2,6 1,5,4,4,4,1,2,1,6 2,4,4,4,4,2,6,1,4 1,4,4,4,4,1,2,2,2 2,4,3,3,3,2,1,1,1 1,5,2,1,4,1,6,2,6 1,2,3,3,3,1,2,1,6 2,6,4,4,4,2,3,1,1 1,4,4,4,4,1,2,1,6 1,5,4,4,4,1,2,1,4 1,4,5,5,5,1,6,2,4 2,5,4,4,4,2,3,1,1 The code for this is written like so: from numpy.core.defchararray import count, index import …
Category: Data Science

Really confused with characteristics of Naive Bayes classifiers?

Naive Bayes classifiers have the following characteristics-: They are robust to isolated noise points because such points are averaged out when estimating contiditional probabilities from data. Naive Bayes classifiers can also handle missing values by ignoring the example during model building and classification. They are robust to irrelevant attributes. If X_i is an irrelevant attributet then P(X_i/Y) becomes almost uniformly distributed. The class conditional probability for X_i has no impact on overall computation of posterior probability. I barely understand anything …
Category: Data Science

Naive Bayes unable to detect preprocessing techniques from data

I'm testing out different preprocessing techniques on mulit-class classification problems. I've used multiple algorithms, but the only algorithm that giving me trouble is Naive Bayes. def NB(self): #Import Classifier from sklearn.naive_bayes import MultinomialNB, GaussianNB accuracy = 0 speed = 0 percision = 0 f1 = 0 recall = 0 for count in range(5): nb = Pipeline([('vect', CountVectorizer()), ('tfidf', TfidfTransformer()), ('clf', MultinomialNB()), ]) # There are two options for average: macro or weighted # We are going with Macro since we …
Category: Data Science

Comparison of different Naive Bayes algorithm for SMS classification

There are various types of Naive Bayes algorithms in the Sklearn library: Can all of them be used for text classifications? And which one's perform bette I tested out a simple text classification using Multinomial Naive Bayes, Bernoulli Naive Bayes and Gaussian Naive Bayes. It seemed the Multinomial was somewhat better I am not sure about others and also my observations could be limited to my dataset
Category: Data Science

What type of 'Naive Bayes' algorithm is provided by Orange?

I've been using Orange for a while to rapidly prototype a few classification models. One of the ones I've been using is 'Naive Bayes'. If I understand correctly, there are a few types available based on the Bayesian principle -> Gaussian, Multinomial, Bernoulli etc. I see these as explicitly classes with scikit-learn as well. However I'm unable to tell which type is the one 'Orange' adopts for its 'Naive Bayes' widget. Can someone help me with this?
Category: Data Science

How to implement HashingVectorizer in multinomial naive bayes algorithim

I had used TfidfVectorizer and passed it through MultinomialNB for document classification, It was working fine. But now I need to pass a huge set of documents for ex above 1 Lakh and when I am trying to pass these document content to TfidfVectorizer my local computer hanged. It seems it has a performance issue. So I got a suggestion to use HashingVectorizer. And I used below code for classification(Just replacing TfidfVectorizer by HashingVectorizer) stop_words = open("english_stopwords").read().split("\n") vect = HashingVectorizer(stop_words=stop_words, …
Category: Data Science

Subtraction of Positive and Negative Frequencies in Sentiment Analysis

In the Positive Negative Sentiment Analysis, Would it make sense mathematically to instead of keeping a score of the positive frequencies and negative frequencies of a word, calculate the difference between them? That way each word would have a positivity 'heat' in which a very high value would indicate a very positive word and vice-versa. How this approach would change the model performance?
Category: Data Science

How to improve results from a Naive Bayes algorithm?

I am having some difficulties in improving results from running a Naive Bayes algorithm. My dataset consists of 39 columns (some categorical, some numerical). However I only considered the main variable, i.e. Text, which contains all the spam and ham messages. Since it is a spam filtering, I think that this field can be good. So I used countvectorizer and fit transform using them after removing stopwords. I am getting a 60% of accuracy which is very very low! What …
Category: Data Science

Why naive bayes is "naive"

Some articles say that naive Bayes is naive because of "independence of attributes". Whereas others say "independence of attributes within a class". Can anybody please clear this confusion? Thanks
Category: Data Science

How does the Naive Bayes algorithm function effectively as a classifier, despite the assumptions of conditional indpendence and bag of words?

Naive Bayes algorithm used for text classification relies on 2 assumptions to make it computationally speedy: Bag of Words assumption: the position of words is not considered Conditional Independence: words are independent of one another In reality, neither of those conditions often holds, yet Naive Bayes is quite effective. Why is that?
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.