Why I would use TF-IDF after Bag-of-Words (CountVectorizer)?
In my recent studies over Machine Learning NLP tasks I found this very nice tutorial teaching how to build your first text classifier:
The point is that I always believed that you have to choose between using Bag-of-Words or WordEmbeddings or TF-IDF, but in this tutorial the author uses Bag-of-Words (CountVectorizer) and then uses TF-IDF over the features generated by Bag-of-Words.
text_clf = Pipeline([('vect', CountVectorizer()),
... ('tfidf', TfidfTransformer()),
... ('clf', MultinomialNB()),
... ])
Is that a valid technique? Why would I do it?
Topic bag-of-words tfidf nlp
Category Data Science