Naives Bayes Text Classifier Confidence Score
I am experimenting with building a text classifier using Naive Bayes which has been pretty successful on my test data. One thing i am looking to incorporate is handling text that does not fit into any predefined category that I trained the model on.
Does anyone have some thoughts on how to do this? I was thinking of trying to calculate the confidence score for each document, and if 80 % confidence, for example, it should label the data as N/A
This is my code so far:
df_train = pd.read_csv(__________)
text_clf = Pipeline([('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', MultinomialNB()),])
text_clf = text_clf.fit(df_train.text, df_train.label)
df['predicted'] = predicted
Like I said, it works well for documents that do fit into one of the categories, but if I have something that clearly does not fit into anything, it will still try and assign it a label, my guess is based on some kind of confidence calculation but just not sure how that works
Topic naive-bayes-classifier nlp python
Category Data Science